Leinamycin biosynthesis gene cluster and its components and their uses

ABSTRACT

This invention provides detailed sequence analysis and characterization of the gene cluster responsible for the synthesis of leinamycin in  Streptomyces atroolivaceus.  The leinamycin gene cluster provides a hybrid polyketide synthase/nonribosomal peptide synthetase pathway. Elucidation of the various modules and enzymatic domains characterizing the pathway provides convenient synthetic routes for leinamycins, leinamycin analogs, and various other polyketides, peptides, and hybrid peptide-polyketide natural products.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit of and priority to U.S. Ser. No. 60/278,935, filed on Mar. 26, 2001, which is incorporated herein by reference in its entirety for all purposes.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

Not Applicable

FIELD OF THE INVENTION

This invention relates the field of polyketide synthesis and nonribosomal polypeptide synthesis. In particular this invention pertains to isolation, sequencing, and characterization of the gene cluster responsible for the biosynthesis of leinamycin.

BACKGROUND OF THE INVENTION

Polyketides and nonribosomal peptides are two large families of natural products that include many clinically valuable drugs, such as erythromycin and vancomycin (antibacterial), FK506 and cyclosporin (immunosuppresant), and epothilone, and bleomycin, or leinamycin (antitumor). The biosyntheses of polyketides and nonribosomal peptides are catalyzed by polyketide synthases (PKSs) (Hopwood (1997) Chem. Rev. 97: 2465; Katz (1997) Chem. Rev., 97: 2557; C. Khosla, (1997) Chem. Rev., 97: 2577; Ikeda and Omura, (1997) Chem. Rev., 97: 2591; Staunton and Wilkinson(1997) Chem. Rev., 97: 261 1; Cane et al.(1998) Science 282: 63) and nonribosomal peptide synthetases (NRPSs) (Cane et al.(1998) Science 282: 63. Marahiel et al. (1997) Chem. Rev. 97: 2651; von Döhren et al. (1997) Chem. Rev. 97: 2675), respectively. Remarkably, PKSs and NRPSs use a very similar strategy for the assembly of these two distinct classes of natural products by sequential condensation of short carboxylic acids and amino acids, respectively, and utilize the same 4′-phosphopantetheine prosthetic group, via a thioester linkage, to channel the growing polyketide or peptide intermediate during the elongation processes.

Both type I PKSs and NRPSs are multifunctional proteins that are organized into modules. (A module is defined as a set of distinctive domains that encode all the enzyme activities necessary for one cycle of polyketide or peptide chain elongation and associated modifications.) The number and order of modules and the type of domains within a module on each PKS or NRPS protein determine the structural variations of the resulting polyketide and peptide products by dictating the number, order, choice of the carboxylic acid or amino acid to be incorporated, and the modifications associated with a particular cycle of elongation. Since the modular architecture of both PKS (Cane et al. (1998) Science, 282: 63; Katz and Danadio (1993) Ann. Rev. Microbiol. 47: 875 (1993); Hutchinson and Fuji (1995) Ann. Rev. Microbiol. 49: 201) and NRPS (Cane et al. (1998) Science 282: 63, Stachelhaus et al. (1995) Science 269: 69; Stachelhaus et al. (198) Mol. Gen. Genet. 257: 308; Belshaw et al. (1999) Science 284,486) has been exploited successfully in combinatorial biosynthesis of diverse “unnatural” natural products, a hybrid PKS and NRPS system, capable of incorporating both carboxylic acids and amino acids into the final products, can lead to even greater chemical structural diversity.

Leinamycin (Lnm) is a novel antitumor antibiotic produced by several Streptomyces species (Hara et al. (1989) J. Antibiot. 42: 333-335; Hara et al. (1989) J. Antibiot. 42: 1768-1774; Nakano et al. (1992) Pages 72-75 In Harnessing Biotechnol. 21st Century, Proc. Int. Biotechnol. Symp. Expo. 9th, Ladisch, M. R. and Bose, A., eds., ACS:Washington, D.C.). Its structure was revealed by X-ray crystallographical (Hirayama and Matsuzawa (1993) Chem. Lett. 1957-1958) and spectroscopic analyses (Hara et al. (1989) J. Antibiot. 42: 333-335; Hara et al. (1989) J. Antibiot. 42: 1768-1774) and confirmed by total synthesis (Kanda and Fukuyama (1993) J. Am. Chem. Soc. 115: 8451-8452; Fukuyama and Kanda (194) J. Synth. Org. Chem. Japan, 52, 888-899). It contains an unusual 1,3-dioxo-1,2-dithiolane moiety that is spiro-fused to a thiazole-containing 18-membered lactam ring, a molecular architecture that has not been found to date in any other natural product (FIG. 1).

Lnm exhibits a broad spectrum of antimicrobial activity against Gram-positive and Gram-negative bacteria, but not against fungi. Lnm shows potent antitumor activity in murine tumor models in vivo, including HeLa S3, sarcoma180, B-16, Colon 26, and leukemia P388. It is also active against murine models inoculated with tumors that are resistant to clinically important antitumor drugs, such as cisplatin, doxorubicin, mitomycin, or cyclophosphamide (Hara et al. (1989) J. Antibiot. 42: 333-335; Hara et al. (1989) J. Antibiot. 42.71768-1774; Nakano et aL (1992) Pages 72-75 In Harnessing Biotechnol. 21st Century, Proc. Int. BiotechnoL Symp. Expo. 9th, Ladisch, M. R. and Bose, A., eds., ACS:Washington, D.C.). Lnm preferentially inhibits DNA synthesis and interacts directly with DNA to cause single-strand scission of DNA in the presence of thiol agents as cofactors. The presence of the sulfoxide group in the dithiolane moiety is essential for the DNA-cleaving activity (Hara et al. (1990) Biochemistry 29: 5676-5681). Interestingly, simple 1,3-dioxo-1,2-dithiolanes are also thiol-dependent DNA cleaving agents in vitro (Behroozi et al (1995) J. Org. Chem. 60: 3964-3966; Behroozi et al. (1996) Biochemistry 35: 1768-1774; Mitra et al. (1997) J. Am. Chem. Soc. 119: 11691-11692). However, the mechanisms for DNA cleavage by simple 1,3-dioxo-1,2-dithiolanes and Lnm are distinct oxidative cleavage by 1,3-dioxo-1,2-dithiolanes that convert molecular oxygen to DNA-cleaving oxygen radicals mediated by polysulfides (Behroozi et al (1995) J. Org. Chem. 60: 3964-3966; Behroozi et al. (1996) Biochemistry 35: 1768-1774; Mitra et al. (1997) J. Am. Chem. Soc. 119: 11691-11692) and allylative cleavage by Lnm mediated by an episulfonium ion intermediate (Mitra et al. (1997) J. Am. Chem. Soc. 119: 11691-11692; Asai et al. (1996) J. Am. Chem Soc. 118: 6802-6803; Asai et aL (1997) Bioorg. Med. Chem. 5: 723-729) (FIG. 1). The latter mechanism represents an unprecedented mode of action for the thiol-dependent DNA cleavage by Lnm.

Aimed at discovering clinically useful Lnm analogs, both total synthesis (Kanda and Fukuyama (1993) J. Am. Chem. Soc. 115: 8451-8452; Fukuyama and Kanda (194) J. Synth. Org. Chem. Japan, 52, 888-899; Pattenden and Shuker (1991) Tetrahedron Lett. 32: 6625-6628; Pattenden and Shuker (1992) J. Chem. Soc. Perkin Trans I, 1215-1221; Kanda et al. (1992) Tetrahedron Lett. 33: 5701-5704; Pattenden and Thom (1993) Synlett 215-216) and chemical modification of the natural Lnm have been investigated. Modifications at both C-8 hydroxy and C-9 keto groups as well as at the 1,3-dioxo-1,2-dithiolane moiety have generated a number of lnm analogs with improved antitumor activity and in vivo stability (Kanda et al. (1998) Bioorg. Med. Chem. Lett. 8: 909-912; Kanda et al. (1999) J. Med. Chem. 42: 1330-1332), supporting the wisdom of making novel anticancer drugs based on the Lnm scaffold. However, for a complex molecule like Lnm, chemical total synthesis has very limited practical value, and chemical modification only can access to limited functional groups, often requiring multiple extra protection/deprotection steps.

SUMMARY OF THE INVENTION

This invention pertains to the isolation and elucidation of the leinamycin gene cluster (see SEQ ID NO:1). This gene cluster (nucleic acid sequence) encodes all of the open reading frames (ORPs) that encode polypeptides sufficient to direct the biosynthesis of leinamycin. The nucleic acids can be used in their “native” format or recombined in a wide variety of manners to create novel synthetic pathways.

Thus, in one embodiment, this invention provides an isolated nucleic acid comprising a nucleic acid selected from the group consisting of: 1) A nucleic acid encoding one or more leinamycin (lnm) open reading frames (ORFs) identified in Tables 1 and 2 (ORFs −35 through −1, lnmA through lnmZ, and +1 through +9)). A nucleic acid encoding a polypeptide encoded by any one or more of leinamycin (lnm) open reading frames (ORFs) identified in Tables 1 and 2 (ORFs −35 through −1, lnmA through lnmZ, and +1 through +9); 3) A nucleic acid comprising the nucleotide sequence of a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs identified in Table 2 and the nucleic acid of a leinamycin-producing organism as a template; and 4) A nucleic acid that encodes a protein comprising at least one catalytic domain selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an enoyl reductase domain (ER), a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain, and that specifically hybridizes to one or more of lnm ORFs −35 through −1, lnmA through lnmZ, and/or +1 through +9 under stringent conditions.

In certain embodiments, the isolated nucleic acid comprises a nucleic acid encoding at least two, preferably at least three, more preferably at least four, and most preferably at least five, open reading frames independently selected from the group consisting of leinamycin (lnm) open reading frames −35 through −1, lnmA through lnmZ, and +1 through +9. In particularly preferred embodiments, the isolated nucleic acid encodes a module. In another embodiment, the nucleic acid comprises a nucleic acid encoding a module comprising two or more catalytic domains of a protein encoded by a nucleic acid of a leinamycin (lnm) gene cluster where the catalytic domains are selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an enoyl reductase domain, a methyltraasferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain. In particularly preferred embodiments, the nucleic acid comprises an open reading frame from SEQ ID NO: 1 or the complement therof (e.g. as described in Table 2). In other preferred embodiments, the nucleic acid has the nucleotide sequence of a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs identified in Table 2 and the nucleic acid of a leinamycin-producing organism (e.g. S. atroolivaceus) as a template.

In still yet another embodiment, this invention provides an isolated nucleic acid comprising a leinamycin (lnm) open reading frame (ORF) or an allelic variant thereof (e.g. a single nucleotide polymorphism (SNP) of a leinamycin (lnm) open reading frame (ORF)).

In another embodiment, this invention provides an isolated gene cluster comprising open reading frames encoding polypeptides sufficient to direct the assembly of a leinamycin.

In one embodiment, this invention provides an isolated nucleic acid encoding a multi-functional protein complex comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS), where the polyketide synthase or the peptide synthetase, has the amino acid sequence of a PKS or an NRPS encoded by the leinamycin (lnm) gene cluster.

This invention also provides for various proteins. Thus, in one embodiment, this invention provides an isolated multi-functional protein complex comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS), where the polyketide synthase (PKS) and/or the peptide synthetase (NRPS) ahs the amino acid sequence of a PKS or an NRPS found encoded by a nucleic acid from the leinamycin gene cluster.

In another embodiment, this invention provides a polypeptide selected from the group consisting of: 1) A catalytic domain encoded by one or more leinamycin (lnm) open reading frames (ORFs) e.g., ORFs 35 through −1, lnmA through lnmZ, and +1 through +9, (e.g. as identified in Table 2); 2) A catalytic domain encoded by a nucleic acid having the sequence of a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs identified in Table 2; and 3) A module comprising two or more catalytic domains of a protein encoded by a nucleic acid of a leinamycin gene cluster. In preferred embodiments, the polypeptide comprises an enzymatic domain selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an NADH dehydrogenase domain, a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain. In certain embodiments, the polypeptide comprises domains encoded by at least two, preferably at least three, more preferably at least four, and most preferably at least five, open reading frames independently selected from the group consisting of leinamycin (lnm) open reading frames 1 through 72. In certain embodiments, the polypeptide can comprise a module comprising two or more catalytic domains of a protein encoded by a nucleic acid of a leinamycin gene cluster where the catalytic domains are selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an enoyl reductase domain (ER), a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain.

In another embodiment this invention provides an isolated polypeptide comprising a module where the module is specifically bound by an antibody that specifically binds to a leinamycin (lnm) module (e.g. is cross-reactive with an lnm polypeptide). IN preferred embodiments, the polypeptide is specifically bound by an antibody that specifically binds to a polypeptide encoded by a leinamycin open reading frame.

In certain embodiments, this invention provides an expression vector comprising any one or more of the nucleic acids described herein. The nucleic acids are preferably operably linked to a promoter (e.g. constitutive promoter, inducible promoter, tissue specific promoter, etc.). Also provided are host cells transfected and/or transformed with such a vector. Thus, in one embodiment this invention provides a host cell (e.g. bacterial cell) transfected and/or transformed with an exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the assembly of a leinamycin or leinamycin analog and/or with a nucleic acid sufficient to introduce a modification into an lnm gene cluster (e.g. via homologous recombination). Particularly preferred cells include, but are not limited to, eukaryotic cells, insect cells, and bacterial cells (e.g. Streptomyces cells).

This invention also provides a method of chemically modifying a molecule. The method involves contacting a molecule that is a substrate for a polypeptide encoded by one or more leinamycin biosynthesis gene cluster open reading frames (e.g. a leinamycin intermediate metabolite) with a polypeptide encoded by one or more leinamycin biosynthesis gene cluster open reading frames, whereby the polypeptide chemically modifies the molecule. In preferred embodiments, the method comprises contacting the molecule with at least two, preferably at least 3, more preferably at least 4 and most preferably at least 5 different polypeptides encoded by leinamycin (lnm) gene cluster open reading frames. The contacting can be ex vivo or in a host cell (e.g. a bacteriu such as Streptomyces). The molecule can include, but is not limited to, an endogenous metabolite produced by the host cell or an exogenous supplied metabolite. In certain embodiments, the cell is a eukaryotic cell (e.g. a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an insect cell). In certain embodiments, the (substrate) molecule is an amino acid and the polypeptide is a peptide synthetase. In certain embodiments, the polypeptide is an amino transferase.

In still another embodiment, this invention provides a cell that overexpresses leinamycin. A particularly preferred cell overexpresses a polypeptide encoded by leinamycin open reading frame lnmG and/or lnmL.

This invention also provides a method of coupling a first amino acid to a second amino acid. The method involves comprising contacting the first and second amino acid with a recombinantly expressed leinamycin nonribosomal peptide synthetase (NRPS) (e.g. NRPS-1, NRPS-2, etc.). The contacting can be ex vivo or in a host cell (e.g. a bacterium).

This invention also provides a method of coupling a first fatty acid to a second fatty acid. This method involves contacting the first and second fatty acids with a recombinantly expressed leinamycin polyketide synthase (PKS) (e.g. PKS-1, PKS-2, PKS-3, PKS4, PKS-5, PKS-6). The contacting can be ex vivo or in a host cell (e.g. a bacterium).

In one embodiment, this invention provides a method of producing a leinamycin or leinamycin analog. The method involves providing a cell transformed with an exogenous nucleic acid comprising a leinamycin gene cluster encoding polypeptides sufficient to direct the assembly of the leinamycin or leinamycin analog; culturing the cell under conditions permitting the biosynthesis of leinamycin or leinamycin analog; and isolating the leinamycin or leinamycin analog from the cell.

In another embodiment, this invention provides a method of producing a leinamycin analog. The method involves providing a cell comprising a leinamycin gene cluster, transfecting the cell with a nucleic acid that alters the leinamycin gene cluster through homologous recombination, culturing the cell under conditions permitting the biosynthesis of the leinamycin analog; and isolating the leinamycin analog from the cell.

In still another embodiment, this invention provides an isolated nucleic acid comprising a nucleic acid encoding a phosphopantetheinyl transferase (PPTase) the nucleic acid encoding a phosphopantetheinyl transferase being selected from the group consisting of: 1) A nucleic acid encoding the protein comprising the amino acid sequence encoded by sap (Streptomyces atroolivaceus phosphopantetheinyl transferase) of FIG. 5; 2) A nucleic acid encoding a polypeptide having phosphopantetheinyl transferase activity where said nucleic acid specifically hybridizes to the nucleic acid having the sequence encoded by sap (Streptomyces atroolivaceus phosphopantetheinyl transferase) of FIG. 5 under stringent conditions. In a particularly preferred embodiment, the nucleic acid comprises the sequence of lmp in FIG. 5. As described above, the nucleic acid comprises a vector and cells comprising such a vector are provided herein. Also provided is a polypeptide encoded by a phosphopantetheinyl transferase nucleic acid described herein.

This invention also provides a method of converting an apo-carrier protein to a holo-carrier protein comprising reacting the apo-carrer protein with a recombinant phosphopantetheinyl transferase encoded by the lnm PPTase nucleic acid described herein and coenzyme A thereby producing a holo-carrier protein.

In still yet another embodiment, this invention provides a cell comprising a modified leinamycin gene cluster nucleic acid, where the cell produces elevated amounts of leinamycin as compared to the wild type cell. Particularly preferred cells overexpress a resistance gene from the leinamycin gene cluster (e.g. a resistance gene listed in Table 2).

This invention also provides antibodies that specifically bind to a polypeptide encoded by an lnm open reading frame identified in Table 2. The antibodies include, but are not limited to intact antibodies, antibody fragments, and single chain antibodies.

DEFINITIONS

The “polyketide synthases” (PKSs) refers are multifunctional enzymes, related to fatty acid synthases (FASs). PKSs catalyze the biosynthesis of polyketides through repeated (decarboxylative) Claisen condensations between acylthioesters, usually acetyl, propionyl, malonyl or methylmalonyl. Following each condensation, they typically introduce structural variability into the product by catalyzing all, part, or none of a reductive cycle comprising a ketoreduction, dehydration, and enoylreduction on the β-keto group of the growing polyketide chain. PKSs incorporate enormous structural diversity into their products, in addition to varying the condensation cycle, by controlling the overall chain length, choice of primer and extender units and, particularly in the case of aromatic polyketides, regiospecific cyclizations of the nascent polyketide chain. After the carbon chain has grown to a length characteristic of each specific product, it is typically released from the synthase by thiolysis or acyltransfer. Thus, PKSs consist of families of enzymes which work together to produce a given polyketide. Two general classes of PKSs exist. One class, known as Type I PKSs, is represented by the PKSs for macrolides such as erythromycin. These “complex” or “modular” PKSs include assemblies of several large multifunctional proteins carrying, between them, a set of separate active sites for each step of carbon chain assembly and modification (Cortes et al. (1990) Nature 348: 176; Donadio et al. (1991) Science 252: 675; MacNeil et al. (1992) Gene 115: 119). Structural diversity occurs in this class from variations in the number and type of active sites in the PKSs. This class of PKSs displays a one-to-one correlation between the number and clustering of active sites in the primary sequence of the PKS and the structure of the polyketide backbone. The second class of PKSs, called Type II PKSs, is represented by the synthases for aromatic compounds. Type II PKSs typically have a single set of iteratively used active sites (Bibb et al. (1989) EMBO J. 8: 2727; Sherman et al. (1989(EMBO J. 8: 2717; Fernandez-Moreno, et al. (1992) J. Biol. Chem. 267:19278).

A “nonribosomal peptide synthase” (NRPS) refers to an enzymatic complex of eukaryotic or prokaryotic origin, that is responsible for the synthesis of peptides by a nonribosomal mechanism, often known as thiotemplate synthesis (Kleinkauf and von Doehren (1987) Ann. Rev. Microbiol., 41: 259-289). Such peptides, which can be up to 20 or more amino acids in length, can have a linear, cyclic (cyclosporin, tyrocidine, mycobacilline, surfactin and others) or branched cyclic structure (polymyxin, bacitracin and others) and often contain amino acids not present in proteins or modified amino acids through methylation or epimerization.

A “module” refers to a set of distinctive polypeptide domains that encode all the enzyme activities necessary for one cycle of polyketide or peptide chain elongation and associated modifications.

The terms “isolated” “purified” or “biologically pure” refer to material which is substantially or essentially free from components which normally accompany it as found in its native state. With respect to nucleic acids and/or polypeptides the term can refer to nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking them in nature.

The terms “polypeptide”, “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The term also includes variants on the traditional peptide linkage joining the amino acids making up the polypeptide.

The terms “nucleic acid” or “oligonucleotide” or grammatical equivalents herein refer to at least two nucleotides covalently linked together. A nucleic acid of the present invention is preferably single-stranded or double stranded and will generally contain phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are included that may have alternate backbones, comprising, for example, phosphoramide (Beaucage et al. (1993) Tetrahedron 49(10):1925) and references therein; Letsinger (1970) J. Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al. (1986) Nucl. Acids Res. 14: 3487; Sawai et al. (1984) Chem. Lett. 805, Letsinger et al. (1988) J. Am. Chem. Soc. 110: 4470; and Pauwels et al. (1986) Chemica Scripta 26: 1419), phosphorothioate (Mag et al. (1991) Nucleic Acids Res. 19:1437; and U.S. Pat. No. 5,644,048), phosphorodithioate (Briu et al. (1989) J. Am. Chem. Soc. 111 :2321, O-methylphophoroamidite linkages (see Eckstein, Oligonucleotides ans Analogues: A Practical Approach, Oxford University Press), and peptide nucleic acid backbones and linkages (see Egholm (1992) J. Am. Chem. Soc. 114:1895; Meier et al. (1992) Chem. Int. Ed. Engl. 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al. (1996) Nature 380: 207). Other analog nucleic acids include those with positive backbones (Denpcy et al. (1995) Proc. Natl. Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Pat. Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Intl. Ed. English 30: 423; Letsinger et al. (1988) J. Am. Chem. Soc. 110:4470; Letsinger et al. (1994) Nucleoside & Nucleotide 13:1597; Chapters 2 and 3, ASC Symposium Series 580, “Carbohydrate Modifications in Antisense Research”, Ed. Y. S. Sanghui and P. Dan Cook; Mesmaeker et al. (1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al. (1994) J. Biomolecular NMR 34:17; Tetrahedron Lett. 37:743 (1996)) and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580, Carbohydrate Modifications in Antisense Research, Ed. Y. S. Sanghui and P. Dan Cook. Nucleic acids containing one or more carbocyclic sugars are also included within the definition of nucleic acids (see Jenkins et al. (1995), Chem. Soc. Rev. pp169-176). Several nucleic acid analogs are described in Rawls, C & E News Jun. 2, 1997 page 35. These modifications of the ribose-phosphate backbone may be done to facilitate the addition of additional moieties such as labels, or to increase the stability and half-life of such molecules in physiological environments.

The term “heterologous” as it relates to nucleic acid sequences such as coding sequences and control sequences, denotes sequences that are not normally associated with a region of a recombinant construct, and/or are not normally associated with a particular cell. Thus, a “heterologous” region of a nucleic acid construct is an identifiable segment of nucleic acid within or attached to another nucleic acid molecule that is not found in association with the other molecule in nature. For example, a heterologous region of a construct could include a coding sequence flanked by sequences not found in association with the coding sequence in nature. Another example of a heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., synthetic sequences having codons different from the native gene). Similarly, a host cell transformed with a construct which is not normally present in the host cell would be considered heterologous for purposes of this invention.

A “coding sequence” or a sequence that “encodes” a particular polypeptide (e.g. a PKS, an NRPS, etc.), is a nucleic acid sequence which is ultimately transcribed and/or translated into that polypeptide in vitro and/or in vivo when placed under the control of appropriate regulatory sequences. In certain embodiments, the boundaries of the coding sequence are determined by a start codon at the 5′ (amino) terminus and a translation stop codon at the 3′ (carboxy) terminus. A coding sequence can include, but is not limited to, cDNA from prokaryotic or eukaryotic mRNA, genomic DNA sequences from prokaryotic or eukaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a transcription termination sequence will usually be located 3′ to the coding sequence.

Expression “control sequences” refers collectively to promoter sequences, ribosome binding sites, polyadenylation signals, transcription termination sequences, upstream regulatory domains, enhancers, and the like, which collectively provide for the transcription and translation of a coding sequence in a host cell. Not all of these control sequences need always be present in a recombinant vector so long as the desired gene is capable of being transcribed and translated.

“Recombination” refers to the reassortment of sections of DNA or RNA sequences between two DNA or RNA molecules. “Homologous recombination” occurs between two DNA molecules which hybridize by virtue of homologous or complementary nucleotide sequences present in each DNA molecule.

The terms “stringent conditions” or “hybridization under stringent conditions” refers to conditions under which a probe will hybridize preferentially to its target subsequence, and to a lesser extent to, or not at all to, other sequences. “Stringent hybridization” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization experiments such as Southern and northern hybridizations are sequence dependent, and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I chapter 2 Overview of principles of hybridization and the strategy of nucleic acid probe assays, Elsevier, N.Y. Generally, highly stringent hybridization and wash conditions are selected to be about 5° C. lower than the thermal melting point (T_(m)) for the specific sequence at a defined ionic strength and pH. The T_(m) is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the T_(m) for a particular probe.

An example of stringent hybridization conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on a filter in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42° C., with the hybridization being carried out overnight. An example of highly stringent wash conditions is 0.15 M NaCl at 72° C. for about 15 minutes. An example of stringent wash conditions is a 0.2×SSC wash at 65° C. for 15 minutes (see, Sambrook et al. (1989) Molecular Cloning—A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is 1×SSC at 45° C. for 15 minutes. An example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6×SSC at 40° C. for 15 minutes. In general, a signal to noise ratio of 2× (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. Nucleic acids which do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.

A “library” or “combinatorial library” of polyketides and/or polypeptides is intended to mean a collection of polyketides and/or polypeptides (or other molecules) catalytically produced by a PKS and/or NRPS and/or hybrid PKS/NRPS (or other possible combination of synthetic elements) gene cluster. The library can be produced by a gene cluster that contains any combination of native, homolog or mutant genes from aromatic, modular or fungal PKSs and/or NRPSs. The combination of genes can be derived from a single PKS and/or NRPS gene cluster, e.g., act, fren, gra, tcm, whiE, gris, ery, or the like, and may optionally include genes encoding tailoring enzymes which are capable of catalyzing the further modification of a polypeptide, polyketide, or other molecule. Alternatively, the combination of genes can be rationally or stochastically derived from an assortment of NRPS and/or PKS gene clusters. The library of polyketides and/or polypeptides and/or other molecules thus produced can be tested or screened for biological, pharmacological or other activity.

By “random assortment” is intended any combination and/or order of genes, homologs or mutants which encode for the various PKS and/or NRPS enzymes, modules, active sites or portions thereof derived from aromatic, modular or fungal PKS and/or NRPS gene clusters.

By “genetically engineered host cell” is meant a host cell where the native PKS and/or NRPS gene cluster has been altered or deleted using recombinant DNA techniques or a host cell into which a heterologous PKS and/or NRPS and/or hybrid PKS/NRPS gene cluster has been inserted. Thus, the term would not encompass mutational events occurring in nature. A “host cell” is a cell derived from a prokaryotic microorganism or a eukaryotic cell line cultured as a unicellular entity, which can be, or has been, used as a recipient for recombinant vectors bearing the PKS, NRPS, and/or hybrid gene clusters of the invention. The term includes the progeny of the original cell which has been transfected. It is understood that the progeny of a single parental cell may not necessarily be completely identical in morphology or in genomic or total DNA complement to the original parent, due to accidental or deliberate mutation. Progeny of the parental cell which are sufficiently similar to the parent to be characterized by the relevant property, such as the presence of a nucleotide sequence encoding a desired PKS, are included in the definition, and are covered by the above terms.

Expression vectors are defined herein as nucleic acid sequences that are direct the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA between hosts, such as bacteria-yeast or bacteria-animal cells. An appropriately constructed expression vector preferably contains: an origin of replication for autonomous replication in a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally one or more constitutive or inducible promoters. In preferred embodiments, an expression vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS and/or NRPS domains and/or modules is operably linked to suitable control sequences capable of effecting the expression of the products of these synthase and/or synthetases in a suitable host. Control sequences include a transcriptional promoter, an optional operator sequence to control transcription and sequences that control the termination of transcription and translation, and so forth.

A “leinamycin (Lnm) open reading frame”, or “leinamycin ORF”, or “Lnm Orf” refers to a nucleic acid open reading frame that encodes a polypeptide or polypeptide domain that has an enzymatic activity used in the biosynthesis of a leinamycin.

A “PKS/NRPS/PKS” system refers to a synthetic system comprising an NRPS flanked by two PKSs. A “NRPS/PKS/NRPS” system refers to a synthetic system comprising a PKS flanked by two NRPSs. A “hybrid PKS/NRPS system” or a “hybrid NRPS/PKS system” refers to a hybrid synthetic system comprising at least one PKS and one NRPS module. The system can comprise multiple modules and the order can vary.

A “biological molecule that is a substrate for a polypeptide encoded by a leinamycin biosynthesis gene” refers to a molecule that is chemically modified by one or more polypeptides encoded by open reading frame(s) of the Lnm gene cluster. The “substrate” may be a native molecule that typically participates in the biosynthesis of a leinamycin, or can be any other molecule that can be similarly acted upon by the polypeptide.

A “polymorphism” is a variation in the DNA sequence of some members of a species. A polymorphism is thus said to be “allelic,” in that, due to the existence of the polymorphism, some members of a species may have the unmutated sequence (i.e. the original “allele”) whereas other members may have a mutated sequence (i.e. the variant or mutant “allele”). In the simplest case, only one mutated sequence may exist, and the polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three genotypes are possible. They can be homozygous for one allele, homozygous for the other allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or the other, thus only two genotypes are possible. The occurrence of alternative mutations can give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s) that comprise the mutation.

“Single nucleotide polymorphism” or “SNPs are defined by their characteristic attributes. A central attribute of such a polymorphism is that it contains a polymorphic site, “X,” most preferably occupied by a single nucleotide, which is the site of the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see, e.g., U.S. Pat. No. 5,952,174).

The following abbreviations are used herein:: A, adenylation; ACP, acyl carrier protein; AT, acyltransferase; lnm, leinamycin; C, condensation; CL, co-enzyme A ligase; Cy, condensation/cyclization; DH, dehydratase; ER, enoyl reductase; KR, ketoreductase; KS, ketoacyl synthase; MT, methyltransferase; NRPS, nonribosomal peptide synthetase; orf, open reading frame; Ox, oxidation; PCP, peptidyl carrier protein; PCR, polymerase chain reaction; PKS, polyketide synthase; ArCP, aryl carrier protein, bp, base pair, CoA, co-enzyme A, DTT, dithiothreitol; FAS, fatty acid synthase; kb, kilobase; PPTase, 4′-phosphopantetheinyl transferase; TCA, trichloroacetic acid; TE, thioesterase.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the structure of leinamycin (Lnm) and the proposed mechanism for its oxidative and alkylative DNA-cleavage.

FIG. 2 shows a map of the 132 kb DNA from S. atroolivaceus that harbors the leinamycin biosynthetic gene cluster.

FIG. 3 shows the genetic organization nfo the leinamycin biosynthetic gene cluster from S. atroolivaceus. NRPS and PKS genes are shown by solid arrows.

FIG. 4 illustrates a proposed Lnm biosynthetic pathway in S. atroolivaceus involving a hybrid NRPS/PKS. Abbreviations: A, adenylation; ACP, acyl carrier protein; AT, acyltransferase; C, condensation; CL, co-enzyme A ligase; Cy, condensation/cyclization; DH, dehydratase; ER, enoyl reductase; KR, ketoreductase; KS, ketoacyl synthase; Ox, oxidation; MT, methyltransferase; PCP, peptidyl carrier protein; TE, thioesterase.

FIG. 5 alignment of Streptomyces Inn pptase with pptase fragments from other species. SC5A7 (SEQ ID NO:223), color (SEQ ID NO:224), Svp (SEQ ID NO:225), gris (SEQ ID NO:226), albua (SEQ ID NO:227), lnm (SEQ ID NO:228), consensus (SEQ ID NO:229).

FIG. 6 illustrates how altering InniG expression alters leinamycin expression. Inactivation of lnmG yields an S. atroolivaceus mutant strain whose ability to produce leinamycin is completely abolished. Introduction of an lnmG overexpression plasmid into the S. atroolivaceus lnmG mutant not only restores its ability to produce leinamycin but also results in an overproduction of leinamycin in comparison with the wild type S. atroolivaceus strain. Thus S. atroolivaceus lnmG mutant transformed with a low-copy-number (10) plasmid in which the expression of lnmG is under the control of the ermE* promoter produces similar level of leinamycin as the wild type S. atroolivaceus strain. S. atroolivaceus lnmG mutant transformed with a medium-copy-number (300) plasmid in which the expression of lnmG is under the control of the ermE* promoter produces 3-5 fold more leinamycin than the wild type S. atroolivaceus strain.

FIG. 7 illustrates examples of novel Lnm analogs that can be prepared by manipulating the Lnm NRPS and PKS genes.

FIG. 8 illustrates nucleophilic attack of the episulfonium ion by —NH₂ or H₂O via S_(N)1 or S_(N)2 mechanism.

FIGS. 9-9C illustrate the production of novel leinamycins by engineering leinamycin biosynthesis. FIG. 9A: LnmH is a protein of unknown function on the basis of amino acid sequence analysis. Inactivation of LnmH yields an S. atroolivaceus mutant that no longer produces leinamycin but accumulates at least two new leinamycin metabolites upon HPLC analysis. Complementation of the LnmH mutant by overexpression of LnmH under the ermE* promoter in a low-copy-number plasmid restores the leinamycin production to the mutant strain with the same metabolite profile as the wild type S. atroolivaceus strain. In FIG. 9B, leinamycin productions were compared in fermentation media supplemented with various concentratiqns of D-alanine. Leinamycin production can be improved by 3-5 folds upon addition of 25 mM D-analine. FIG. 9C showed that inactivation of either one of the two P-450 hydroxylases (LnmA or LnmZ) can lead to the production of new leinamycins, one of which could be the 8-dehydroxyl-Lnm showed in FIG. 7

FIG. 10 shows the results of targeted gene inactivation of lnmG and lnmI on leinamycin biosynthesis and establishes the involvement of the cloned gene cluster in leinamycin production. LnmG is a di-domain proteins that shows high amino acid sequence homology to known polyketide synthase. LnmI is multi-domain protein that shows high amino acid sequence homology to known polyketide synthase and nonribosomal peptide synthetase. Inactivation of lnmG and lnmI, respectively, by targeted gene replacement experiments, produces S. atroolivaceus mutant strains that no longer produce leinamycin and its biosynthetic intermediates. This was confirmed by BPLC analysis.

DETAILED DESCRIPTION

This invention pertains to the isolation, identification, and characterization of the gene cluster that directs synthesis of Leinamycin (Lnm). Leinamycin is a macrolactam of hybrid polyketide and nonribosomal peptide origin with an unprecedented 1,3-dioxo-1,2-dithiolane structure. Lmn shows potent antitumor activity in tumor models in vivo, including those that are resistant to clinically important anticancer drugs. Although Lnm analogs with improved antitumor activity have been generated by chemical modifications, supporting the wisdom of making novel anticancer drugs based on the Lnm scaffold, development of Lnm into a clinical anticancer drug has been hampered by the in vivo instability of the natural Lnm.

Genetic manipulations of both polyketide and nonribosomal peptide biosynthesis have generally been very successful in generating novel “unnatural” natural products. In a similar manner, extension of these genetic-based approaches, also known as “Combinatorial Biosynthesis”, to Lnm biosynthesis provides the ability to generate novel Lnm analogs, which are expected to provide useful “lead compounds” for the development of novel anticancer drugs.

The leinamycin synthetic pathway described herein utilizes a hybrid polyketide-peptide biosynthesis. Polyketides and polypeptides can be assembled in a remarkably similar manner by repetitive addition of an extending. unit to a growing chain by polyketide synthases (PKS) and nonribosomal peptide synthetase (NRPS) respectively. In the case of polyketides, the extending unit is typically a fatty acid (activated as an acyl CoA thioester) while the extending unit for polypeptides is typically an amino acid (activated as an aminoacyl adenylate). Both the PKS and NRPS systems have evolved a modular organization to define the number, sequence, and specificity of the incorporation of the extending unit and utilized the 4′-phosphopanththeine prosthetic group to channel the growing intermediate during the elongation process.

Polyketide metabolites display enormous structural diversity, yet share a common mechanism of biosynthesis. The carbon backbones of polyketides are assembled from short carboxylic acids by sequential decarboxylative condensation, and this process is catalyzed by PKSs. Two types of bacterial PKSs have been characterized to explain the polyketide biochemistry (Katz and Donadio (1993) Ann. Rev. Microbiol. 47: 875-912; Hutchinson and Fuji, (1995) Ann. Rev. Microbiol., 49: 201-38; Cane (1997) Chem. Rev. 97: 2463-2705; Carreras et al. (1997) Top. Curr. Chem. 188: 85-126; Cane, et al. (1998) Science 282: 63-68; Staunton and Wilkinson (1998) Top. Curr. Chem. 195: 49-92; Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 37-66). Type I enzymes are multifunctional proteins that harbor sets of noniteratively used distinct active sites, termed modules, for the catalysis of each cycle of polyketide chain elongation in biosynthesis of reduced polyketides like macrolide, polyether, or polyene antibiotics (Cane (1997) Chem. Rev. 97: 2463-2705; Carreras et al. (1997) Top. Curr. Chem. 188: 85-126; Staunton and Wilkinson (1998) Top. Curr. Chem. 195: 49-92). Type II enzymes are multienzyme complexes that carry a single set of iteratively-used activities and consist of several monofunctional proteins for the synthesis of aromatic polyketides like tetracyclines (Cane (1997) Chem. Rev. 97: 2463-2705). The growing polyketide intermediates in both systems remain covalently attached to the acyl carrier protein (ACP) of the PKS enzyme via the 4′-phosphopantetheine cofactor during the elongation process (Shen, et al. (1997) J. Bacteriol., 174: 3818-3821; Carreras and Khosla (1998) Biochemistry 37: 2084-2088).

Nonribosomal peptides, a structurally diverse, family of bioactive peptides, are assembled nonribosomally from both proteinogenic and nonproteinogenic amino acids, and this process is catalyzed by NRPSs. Remarkably, NRPS possesses a similar multimodular structure as type I PKss, and these modules represent the functional building units of an NRPS that activate, modify and link together by amnide or ester bonds the constituent amino acids of the peptide product (Cane (1997) Chem. Rev. 97: 2463-2705; Carreras et al. (1997) Top. Curr. Chem. 188: 85-126; Cane, et al. (1998) Science 282: 63-68; Cane and Walsh (1999) Chem. Biol. 6: R319-R325; Konz and Marahiel (1999) Chem. Biol. 6: R39-R48; von Döhren et al. (1999) Chem. Biol. 6: R273-R279). Modules can be either physically linked together, or interact noncovalently via protein/protein recognition to form the protein-template that dictates the number and sequence of the amino acids incorporated into the peptide products. Individual modules are characterized by domains that catalyze the activation of constituent amino acids as acyladenylates (A domain) (Weinreb et al. (1998) Biochemistry 37: 1575-1584; Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. (197) Chem. Biol. 4: 927-937; Mootz and Marahiel (1997) J. Bacteriol. 179: 6843-6850; Stachelhaus et al. (1999) Chem. Biol., 6: 493-505; Challis et al. (2000) Chem. Biol. 7: 211-224), the thioesterification of the activated amino acids with the sulfhydryl group of the 4′-phosphopantetheine cofactor that is covalently bound to the C-terminal region of each amino acid activating domain (thiolation domain or PCP) (Gehring et al. (1997) Chem. Biol., 4: 17-24; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Weinreb et al. (1998) Biochemistry 37: 1575-1584; Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. (197) Chem. Biol. 4: 927-937; Mootz and Marahiel (1997) J. Bacteriol. 179: 68436850; Stachelhaus et al. (1999) Chem. Biol., 6: 493-505; Challis et al. (2000) Chem. Biol. 7: 211-224; Stachelhaus et al. (1996) Chem. Biol.3: 913-921; Pfeifer et al. (1995) Biochemistry, 34: 7450-7459; Haese et al. (1994) J. Mol. Biol. 243: 116-122; Quadri et al. (1998) Biochemistry, 37: 1585-1595; Rose et al. (1998) Nucleic Acids Res., 26: 1628-1635; Ku et al. (1997) Chem. Biol., 4: 203-207; Reuter et al. (1999) EMBO J., 18: 6823-6831), and the transpeptidation of carboxy thioester activated amino acids between the aligned domains, resulting in the formation of a specific peptide chain with a defined sequence (condensation or C domain) (Stachlhaus et al. (198) J. Biol. Chem., 273: 22773-22781).

Additional domains have also been identified that are responsible for the modification of the peptide product, such as epimerization (E) domains for the conversion of L-amino acids to D-amino acids (Marahiel et al. (1997) Chem. Rev., 97: 2651-2673), N-methyltransferase (MT) domains for the addition of methyl groups to the amide nitrogen (Id.), cyclization (Cy) domains for the formation of heterocyclic rings (Konz et al. (197) Chem. Biol. 4: 927-937), a reduction domain for reductive release of an aldehyde product (Ehmann et al. (1999) Biochemistry, 38: 6171-6177), and oxidation domains for the thiazolidine-to-thiazole oxidation (Ox) (Molnar et al. (1999) Chem. Biol., 7: 97-109) or for α-hydroxylation of the incorporated amino acid (Ox′) (Silakowski et al. (1999) J. Biol. Chem., 274: 37391-37399).

Hybrid peptide-polyketide metabolites, such as leinamycin, are structurally characterized by both the amino acid and the carboxylic acid building blocks. The assembly of hybrid peptide-polyketide metabolites from amino acids and carboxylic acids is catalyzed by a hybrid NRPS-PKS system that bears the characteristics of both NRPS and PKS (Du et al. (2001) Metabolic Eng. 3: 78-95; Du and Shen (2001) Curr. Opinion Drug Discov. Dev. 4: 215-228). The interacting NRPS and PKS modules can be either covalently linked by arranging the catalytic domains in a linear order on the same protein, or physically located on two or more separate proteins, utilizing specific protein-protein recognition to ensure the correct pairing between the interacting modules. One such hybrid system is exemplified by the bleomycin biosynthesis, BlmIX NRPS/BlmVIII PKS/BlmVII NRPS system, combining the features of both hybrid NRPS/PKS and PKS/NRPS systems (see e.g. U.S. Ser. No. PCT/US00/00445; Shen et al. (1999) Bioorg. Chem., 27: 155-171).

In one aspect, this invention provides a cloned and characterized (LNM) leinamycin gene cluster (˜61.3 kb) consisting of characteristic NRPS and PKS genes from the (lnm) leinamycin producer Streptomyces atroolivaceus. The cloned and isolated (lnm) leinamycin gene cluster and subunits (e.g. ORFs) thereof provides a method of recombinantly expressing leinamycin and/or leinamycin analogues. Thus, in one embodiment, this invention provides for nucleic acids encoding leinamycin synthetic machinery or subunits thereof, for cells recombinantly modified to express a leinamycin and/or leinamycin analogue, and for a leinamycin or leinamycin analogue recombinantly expressed in such cells.

Like other polyketide synthase or nonribosomal peptide synthetases, the leinamycin synthetic pathway is organized into modules, each module catalyzing the addition and/or modification of one subunit (e.g. fatty acid and/or amino acid). Each module is organized into a number of domains each domain having a characteristic activity (e.g. activation, condensation, condensation/cyclization, etc.). The catalytic domains within a module and the modules themselves are often arranged collinearly and the order of biosynthetic modules from NH₂— to COOH-terminus on each PKS and NRPS polypeptide and the number and type of catalytic domains within each determine the order of structural and functional elements in the resulting product.

The size and complexity of the ultimately formed product are controlled, in part, by the number of repeated acyl chain extension steps that are, in turn, a function of the number and placement of carrier protein domains in these multimodular enzymes. The number composition and order of such domains can be altered either to introduce modifications, e.g. into the leinamycin to produce leinamycin analogues, or to produce different or completely new molecules. Such “recombination” is not restricted solely to recombination among the leinamycin catalytic domains and/or modules, but can also involve recombination between leinamycin modules and/or subunits and other PKS and/or NRPS modules and/or subunits (e.g. bleomycin subunits). Moreover the discovery that synthetic pathways can incorporate both PKS and NRPS modules and/or catalytic domains makes available hybrid PKS/NRPS syntheses.

Thus, in one embodiment this invention contemplates the use of lnm gene cluster modules and/or catalytic domains to make various peptide and/or polyketide, and/or hybrid polypeptide/polyketide metabolites (including, but not limited to leinamycin, leinamycin intermediates or shunt metabolites), in combinatorial biosynthesis with other polyketide synthases and/or other nonribosomal peptide synthetases.

In particular, it is noted that the various lnm ORFs show characteristic enzymatic activities including, but not limited to aminotransferases, peptide synthetases, thioesterases, decarboxylases, and the like. The proteins encoded by these orfs can be used alone, or in combination with other active domains to modify various target substrates.

This invention also includes the discovery and characterizaton of a novel PPTase (a fragment of which is shown and named lmn in FIG. 5). This PPTase can be cloned and used in engineered biosynthesis of polyketides, peptides, hybrid peptide and polyketide metabolites, hybrid polyketide and peptide metabolites, or the combination of both types of metabolites. The PPTase can also be used in converting apo-peptidyl carrier proteins (both type I and type II) and acyl carrier proteins (both type I and type II) into the holo-proteins.

In certain preferred embodiments, this invention contemplates the use of leinamycin (lnn) gene cluster modules and/or catalytic domains to produce leinamycin (e.g. to upregulate endogenous leinamycin production to permit leinamycin production in cells other than Streptomiyces, etc.) and/or to make various modified leinamycins (leinamycin analogues).

The Examples provided herein and the accompanying primers permit one of ordinary skill in the art to isolate the (lnm) leinamycin gene cluster of this invention, its constituent orfs, various modules, or enzymatic domains. The isolated nucleic acid components can be used to express one or more polypeptide components for in vivo (e.g. recombinant) synthesis of one or more polypeptides and/or polyketides as indicated above. It will also be appreciated that the (lnm) leina7nycin cluster polypeptides can be used for ex vivo assembly of various macromolecules.

I. Biosynthetic Pathway for Leinamycin in S. atroolivaceus.

Without being bound to a particular theory, it is believed that the genetic organization of the mIm gene cluster and the enzymology of Lnm biosynthesis closely parallels that of macrolide and hybrid peptide-polyketide metabolites such as bleomycin. The structural similarity of Lnm to macrolides in general and to thiazole-containing hybrid peptide-polyketide metabolites such as bleomycin in particular provides substantial data on a pathway for Lnm biosynthesis. We believe the synthesis of the Lnm macrolactam backbone involves a hybrid NRPS/PKS system consisting of 2 NRPS and 6 PKS modules, which, at each step, specify the amino acid or carboxylic acid to be incorporated, and the modifications associated with this particular cycle of elongation (FIG. 4). In such a hybrid NRPS/PKS biosynthesis (FIG. 4), Lnm biosynthesis starts with an NRPS module, NRPS-1, that activates L-alanine as an amino acylthioester and epimerizes the latter into D-alanine to set the stage for Lnm biosynthesis. In certain embodiments, however, D-alanine can be a direct substrate for NRPS-1. NRPS-2 generally specifies L-cysteine is typically characterized by the signature Cy and Ox domains for thiazole formation. The NRPS-2-bound peptidyl intermediate is next elongated by the PKS-1 module, the KS domain of which is homologous to the KSs known to interact with NPRS modules in NRPS/PKS hybrids (PCT/US00/00445) and the ACP domain of which is loaded by the acyltransferase (AT) domain of lnmG that should specify for malonyl CoA as an extender unit (Haydock et al. (1995) FEBS Lett., 374: 246-248).

Chain elongation proceeds by sequential incorporation of five additional molecules of malonyl CoA, utilizing five additional PKS modules, PKS-2, PKS-3, PKS4, PKS-5, and PKS-6. The loading of malonyl CoA to the ACP domains of all five PKS modules is catalyzed by the same AT domain of lnmG whose substrate specificity for malonyl CoA as the extender unit for individual PKS modules is easily determined from sequence analysis of the AT domains (Haydock et al. (1995) FEBS Lett., 374: 246-248). Since the AT domain of lhmG calls for malonyl CoA as an extender unit for all six PKS modules, an additional hydroxylation step must have taken place for the introducton of the methyl group at C-6. Macrolide ring hydroxylation is often catalyzed by cytochrome P-450 hydroxylases that have been identified from several macrolide biosynthesis gene clusters (Rodriguez et al. (1995) FEMS Microbiol. Lett. 127: 117-120; Weber et al. (1991) Science, 252: 114-117; Xue et al. (1998) Chem. Biol. 5: 661-667).

Canditates for theste types of hydroxylases have indeed been identified for leinamycin biosynthesis, such as lnmA, lnmB, and lnmZ (FIG. 4). Methylation of macrolide ring has been noted before, deriving the methyl group from S-adenosylmenthiionine (AdoMet) and catalyzed by a methyl transferase domain (Due et al. (2000) Chem. Biol., 7: 623-642). An MT domain has been identified in PKS4 of lnmJ, which presumably is responsible for the introduction of the C-6 methyl group of leinamycin (FIG. 4). Finally, modeled on rifamycin biosynthesis—the only macrolactam whose biosynthesis gene cluster has been characterized to date (August et al. (1998) Chem. Biol., 5: 69-79), we propose that a discrete amide synthase catalyzes the release of the full length peptide/polyketide intermediate from the Lnm synthase complex and the cyclization of the linear intermediate into macrolactam.

II. Leinamycin (lnm) Gene Cluster.

The nucleic acids comprising the leinamycin (lnm) gene cluster are identified in Table 2 and listed in the sequence listing provided herein (SEQ ID NO:1). In particular, Table 1 identifies genes and functions of open reading frames (ORFs) responsible for the biosynthesis of leinamycin, while Table 2 identifies a number of ORPs comprising the leinamycin (lnm) gene cluster, identifies the activity of the catalytic domain encoded by the ORF and provides primers for the amplification and isolation of that orf.

As illustrated in Example 1, the leinamycin (lnm) cluster comprises NRPS modules followed by several PKS modules as well as several resistance and regulatory genes (Table 1). TABLE 1 Summary of functions of open reading frames (ORFs) in the leinamycin biosynthetic gene cluster and flanking regions. Sequence ID Nos refer to amino acid sequences encoded by ORF. # Sequence homologue SEQ Amino (Genbank ID ORF acids Proposed function accession no) NO −35 289 Probable antibiotic SpcN (AAD50455) 2 resistance protein −34 502 Putative FAD- EncM (AAF81732) 3 dependent oxygenase −33 1237 Subtilisin-like secreted SAM-P45 4 protease (BAA12040) −32 262 Unknown 5 −31 401 Probable NADH (E75456) 6 dehydrogenase II −30 306 RNA polymerase BH0672 (BAB04391) 7 ECF-type sigma factor −29 327 Probable macrolide MphA (D16251) 8 2′-phosphotransferase −28 198 Probable tetR-family (T37015) 9 transcriptional regulator −27 538 Antibiotic efflux ActVA-1 (S18539) 10 protein −26 300 Putative hydroxylase SnoaW (AAF01810) 11 −25 197 Probable cyanamide (P22143) 12 hydratase −24 353 Histidinol-phosphate (P45358) 13 aminotransferase −23 774 Putative transcriptional (T34847) 14 regulator −22 72 MbtH-like protein (AAG02368) 15 −21 1105 Nonribosomal peptide (T30289) 16 synthetase −20 330 Probable regulatory SyrP (U88574) 17 protein −19 335 Probable regulatory SyrP (U88574) 18 protein −18 313 Unknown 19 −17 341 Putative fatty acid (AAA99932) 20 desaturase −16 433 Diaminopimelate (P00861) 21 (DAP) decarboxylase −15 794 Putative peptidase (NP_422131) 22 −14 432 Antibiotic transport SpcT (AAD50454) 23 protein −13 1134 Nonribosomal peptide (T30289) 24 synthetase −12 276 Conserved, function (NP_421851) 25 known −11 549 Nonribosomal peptide NosA (AF204805) 26 synthetase −10 235 Thioesterase (AAC01736) 27  −9 276 Short-chain dehydro- LimC (CAB54559) 28 genase/reductase  −8 920 Nonribosomal peptide AcmB (T14591) 29 sythetase  −7 195 Probable N- Ta0454 (CAC11596) 30 carbamoyl-sarcosine amidase  −6 343 Hydrogenase HypE (P24193) 31 expression/formation protein  −5 791 Hydrogenase HypF (P30131) 32 maturation protein (regulator)  −4 447 Serine hydroxymethyl- GlyA (O29406) 33 transferase (SHMT)  −3 238 Probable glutamine (C83609) 34 amidotransferase  −2 1745 Nonribosomal peptide BlmVI (AF210249) 35 synthetase  −1 462 Nonribosomal peptide PvdD (S53999) 36 synthetase lnmA 399 Cytochrome P450 RapN (T30231) 37 hydroxylase lnmB 78 Ferredoxin (T30230) 38 lnmC 115 Unknown 39 lnmD 438 Probable 3-oxoadipate PcaL (AAC38246) 40 enol-lactone hydrolase/ 4-carboxymucono- lactone decarboxylase lnmE 307 Unknown 41 lnmF 265 Probable enoyl-CoA PksH (P40805) 42 hydratase lnmG 795 Probable malonyl-CoA FenF (T44805) 43 acyltransferase/enoyl reductase lnmH 274 Unknown 44 lnmI 4437 Hybrid nonribosomal MtaC/MtaB 45 peptide synthetase/ (AF188187) polyketide synthase lnmJ 7349 Polyketide synthase/ MtaB (AF188287)/ 46 Tyrosine phenol-lyase/ PcaL (AAC38246) ketoadipate-enol lactone hydrolase lnmK 319 Conserved, function TaD (CAB46503) 47 known lnmL 86 Acyl carrier protein TaE (CAB46504) 48 (ACP, type II) lnmM 416 ACP synthase TaF (CAB46505) 49 (FabH homolog) lnmN 267 Thioesterase (type II) GrsT (P14686) 50 lnmO 227 Probable NtcA (AAC14592) 51 transcriptional activator lnmP 82 Peptidyl carrier protein (CAB99152) 52 (PCP, type II) lnmQ 516 Nonribosomal peptide (AAG02343) 53 synthetase (A-domain only, type II) lnmR 575 ABC transporter MoaD (T45539) 54 component (ATP hydrolase) lnmS 287 ABC transporter AgaC (T45530) 55 component (membrane spanning protein) lnmT 321 ABC transporter AgaB (T45531) 56 component (membrane spanning protein) lnmU 513 ABC transporter OphA (S77572) 57 component (periplasmic oligo- peptide binding protein) lnmV 120 Unknown 58 lnmW 516 4-coumarate: CoA (T08074) 59 ligase lnmX 243 Conserved, function (CAC04222) 60 known lnmY 474 Antibiotic efflux Mct (AAD32747) 61 protein lnmZ 400 Cytochrome P450 MycG (S51594) 62 hydroxylase lnmZ′ 134 Unknown 63  +1 216 Conserved, function (C70555) 64 known  +2 272 Thioesterase GrsT (P14686) 65  +3 345 Conserved, function (CAC18692) 66 known  +4 236 Probable tetR-family HemR (BAA21913) 67 transcriptional regulator  +5 539 Antibiotic resistance CarA (AAC32027) 68 protein  +6 322 Probable hydrolase/ VgbB (AAC61670) 69 lactonase  +7 551 ABC transporter VarM (BAA96297) 70  +8 469 Adenosylhomo- SahH (CAB88907) 71 cysteinase  +9 303 5,10-methylenetetra- MetF (O54253) 72 hydrofolate reductase

TABLE 2 ORFs, deduced functions, amino acid sequence homologs, and PCR primers for amplification of individual ORFs identified in the leinamycin biosynthetic gene cluster and its flanking regions SEQ Protein ID ORF# Position Proposed Function Homologue Primers NO −35 3489- Probable antibiotic SpcN Fwd: ATGGAGATGTCCGACACC 73 4358 resistance protein (AAD50455) Rev: CTACTGGCCGCTGCCCAG 74 −34 5108- Putative FAD- EncM Fwd: ATGAGCGACTTTTCCCGC 75 6616 dependent oxygenase (AAF81732) Rev: TCAGCGGGACGCAGGCGG 76 −33 7431- Subtilisin-like secreted SAM-P45 Fwd: TTGCCCAAGCTTCCCATC 77 11144 protease (BAA12040) Rev: TCAGCGCAGGCCGAAGGC 78 −32 11141- Unknown Fwd: ATGGCGGACGAACCTGCG 79 11812 Rev: TCATCGTTCCGTCCTCCT 80 −31 11809- Probable NADH (E75456) Fwd: ATGAGCGCACGGCAGGAG 81 13014 dehydrogenase II Rev: TCACCGTGCCTCCCGGAC 82 −30 13011- RNA polymerase ECF- BH0672 Fwd: GTGACCGACCCGACCGCC 83 13931 type sigma factor (BAB04391) Rev: TCAGCGCGTCCCGACGTC 84 −29 14271- Probable macrolide 2′- MphA Fwd: ATGGTTGCGAACGAGGGT 85 15254 phosphotransferase (D16251) Rev: TCAGCCGAAGCGGCGGAA 86 −28 16277- Probable tetR-family (T37015) Fwd: ATGGGCCGCGTGTCCCAG 87 15681 transcriptional regulator Rev: TCAGTCCATGCGCTGCTG 88 −27 16467- Antibiotic efflux ActVA-1 Fwd: GTGGCATCGCCACCCACC 89 18083 protein (S18539) Rev: TCACTTGTCACCGCCGGT 90 −26 18480- Putative hydroxylase SnoaW Fwd: ATGACTGCCGACAACCTG 91 19382 (AAF01810) Rev: TCAGCCCAGGTAGAGGTC 92 −25 20377- Probable cyanamide (P22143) Fwd: ATGACACTGGACGACCTG 93 19784 hydratase Rev: TCAGTCGAGGCTGTTGGT 94 −24 20662- Histidinol-phosphate (P45358) Fwd: TTGACCACGCTCACGTTC 95 21723 aminotransferase Rev: TCACCGCACGAACGCGTT 96 −23 21994- Putative transcriptional (T34847) Fwd: ATGGAATTCAGCTCGCGA 97 24318 regulator Rev: TCAGGAGCCCGCGGCCAC 98 −22 24524- MbtH-like protein (AAG02368) Fwd: ATGAGCGATCGGGACAGT 99 24742 Rev: TCAGCTGCGGCCCGCCTG 100 −21 24778- Nonribosomal peptide (T30289) Fwd: ATGCAGACCCAGCTCTCC 101 28095 synthetase Rev: TCAGCGGCGCTGCGCGCC 102 −20 28128- Probable regulatory SyrP Fwd: ATGACCATTGAGGTGCAC 103 29120 protein (U88574) Rev: TCATGCGGGCACCTCGCC 104 −19 29117- Probable regulatory SyrP Fwd: ATGACGCTCACCGACCTG 105 30124 protein (U88574) Rev: TCATCGGCCGGCCGGCAG 106 −18 30172- Unknown Fwd: ATGCTGCTGCGCCCCACC 107 31113 Rev: TCAGCCGGCCGGGGCCGA 108 −17 31140- Putative fatty acid (AAA99932) Fwd: ATGACGCAGACCGCCCCC 109 32165 desaturase Rev: TCACGTCCACGGCGTGCT 110 −16 32199- Diaminopimelate (P00861) Fwd: ATGAGACCCGACATGAGT 111 33500 (DAP) decarboxylase Rev: TCACAGACCCTCGGGGAT 112 −15 35984- Putative peptidase (NP-422131) Fwd: ATGGCCGACACCCGTACC 113 33600 Rev: TCAGAGCACGTATCGGCG 114 −14 37313- Antibiotic transport SpcT Fwd: GTGGCGCCGCGCACGCCG 115 36015 protein (AAD50454) Rev: TCAGGTCCGTTCCGGTGC 116 −13 40721- Nonribosomal peptide (T30289) Fwd: ATGACCGAGACCCTGCCC 117 37317 synthetase Rev: TCAGCCCTCCAGCTTCTG 118 −12 41548- Conserved, function (NP_A21851) Fwd: ATGCGATCCGTCCGCACC 119 40718 Unknown Rev: TCATCGCTGTCCCTCCGC 120 −11 41709- Nonribosomal peptide NosA Fwd: ATGACGGCCGACGATTCG 121 43358 synthetase (AF204805) Rev: TCAGGCGGGCGCCTGTTC 122 −10 43412- Thioesterase (AAC01736) Fwd: ATGTTGAGTGCGGCGGTT 123 44119 Rev: TCATGACGGCGTCCCGGC 124 −9 44116- Short-chain LimC Fwd: ATGAGCGGACGGCTCACG 125 44946 dehydrogenase/reductase (CAB54559) Rev: TCAACGGGCGCTGTAGCC 126 −8 44970- Nonribosomal peptide AcmB Fwd: GTGTCGTCCAACTCCCCT 127 47732 synthetase (T14591) Rev: TCAGGCCGTCCTCGCCGC 128 −7 47820- Probable N-carbamoyl- Ta0454 Fwd: ATGAGCAAGGTCGCGGTC 129 48407 sarcosine amidase (CAC11596) Rev: TCAGGGGGTGCGGAACAC 130 −6 48545- Hydrogenase HypE Fwd: TTGCCGACGGCCACGACG 131 49576 expression/formation (P24193) Rev: CAGCACAGGCGGGGAAG 132 protein −5 49599- Hydrogenase HypF Fwd: ATGGCAGAGACCGAGCAG 133 51974 maturation protein (P30131) Rev: TCAGCGGCATTCGTTCGT 134 (regulator) −4 52006- Serine hydroxymethyl- GLyA Fwd: ATGCGGACCGCAGATCTG 135 53349 transferase (SHMT) (O29406) Rev: TCACCGGGACGCCTCTGT 136 −3 53346- Probable glutamine (C83609) Fwd: GTGAGCCGGCCGGTCATC 137 54062 amidotransferase Rev: TCAGACGGATGCCGCTGT 138 −2 54157- Nonribosomal peptide BlmVI Fwd: GTGCACACTCACGTCCGT 139 59394 synthetase (AF210249) Rev: TCAGCCTTGCTGCTGCAG 140 −1 59420- Nonribosomal peptide PvdD Fwd: ATGGCCGTGACACTCAAG 141 60808 synthetase (S53999) Rev: TCAACTCACCGCCGGCTG 142 lnmA 60948- Cytochrome P450 RapN Fwd: ATGTCGGCTACGAGGCGG 143 62147 hydroxylase (T30231) Rev: TCACCATGCGATCGGCAG 144 lnmB 62159- Ferredoxin (T30230) Fwd: ATGGCACGGGAGCAGAAC 145 62395 Rev: TCACGACAGGTCGAGCAC 146 lnmC 62682- Unknown Fwd: ATGAAGTTCGCGATCGTC 147 63029 Rev: TTACTCGGCCACCCACAG 148 lnmD 63116- Probable 3-oxoadipate PcaL Fwd: ATGACGGACGGCGCGATA 149 64432 enol-lactone (AAC38246) Rev: TCACCGTGCGGCGCCGCT 150 hydrolase/4-carboxy- muconolactone decarboxylase lnmE 64500- Unknown Fwd: ATGACCGACGCGGCGAGC 151 65423 Rev: TCAGAACCAGGCGGGCGC 152 lnmF 65441- Probable enoyl-CoA PksH Fwd: GTGACGGCCATCGGCCCG 153 66238 hydratase (P40805) Rev: TCAGGGCCGCGGCTTCTC 154 lnmG 66268- Probable malonyl-CoA FenF Fwd: ATGGTGGCACTGGTTTTC 155 68655 acyltransferase/enoyl (T44805) Rev: TCAGCGGCGGGCGAGGAC 156 reductase lnmH 68725- Unknown Fwd: ATGACCACCCTGACCTTC 157 69549 Rev: CTAGCGGGCGTCCGGCAC 158 lnmI 69681- Hybrid nonribosomal MtaC/MtaB Fwd: ATGACCACCCTGACCTTC 159 82994 peptide synthetase/ (AF188187) Rev: TCACCACTTCCGTCCTTC 160 polyketide synthase lnmJ 82991- Hybrid polyketide MtaB Fwd: GTGAACGTGCCCTCCGCA 161 105040 synthase/tyrosine (AF188287)/ Rev: TCATGCCGGGTGCTCCTC 162 phenol-lyase/ PcaL ketoadipate-enol (AAC38246) lactone hydrolase lnmK 105037- Conserved, function TaD Fwd: ATGACCATCACCTCGTCG 163 105996 unknown (CAB46503) Rev: TCATGCTTCCCCCTTCGG 164 lnmL 105993- Acyl carrier protein TaE Fwd: ATGACCCAGGCACCACTG 165 106253 (ACP, type II) (CAB46504) Rev: TCATCGCGGGGCTCCGCT 166 lnmM 106250- ACP synthase (FabH TaF Fwd: ATGACCGCGACCGGTGCC 167 107500 homolog) (CAB46505) Rev: TCAGCGCCACGCGTACTG 168 lnmN 107557- Thioesterase (type II) GrsT Fwd: GTGTACGGCTCTCGGACG 169 108360 (P14686) Rev: TCACGTGGCAACTTTATG 170 lnmO 108395- Probable transcriptional NtcA Fwd: ATGAACCTGCTGGATGTC 171 109078 activator (AAC14592) Rev: TCAGACGCATCGGCTCTC 172 lnmP 109122- Peptidyl carrier protein (CAB99152) Fwd: ATGTGGGACCACAAGTTC 173 109370 (PCP, type II) Rev: TCATCGGCCGGCTCCGTC 174 lnmQ 109367- Nonribosomal peptide (AAG02343) Fwd: ATGAGCGGCGCCAAGCTG 175 110917 synthetase (A-domain) Rev: TCAGGACGCCGGGGCGAG 176 lnmR 112700- ABC transporter MoaD Fwd: TTGAGCGCAGTCTTCGAC 177 110973 component (T45539) Rev: TCAGACCCCGTCGACTGC 178 lnmS 113560- ABC transporter AgaC Fwd: ATGACGGCCCCGACGCCG 179 112697 component (T45530) Rev: TCAAGGCACGAACCTCGC 180 lnmT 114522- ABC transporter AgaB Fwd: GTGACGTCCGCCGTCCGG 181 113557 component (T45531) Rev: TCATGTCGCCGTCCTCAT 182 lnmU 116060- ABC transporter OphA Fwd: ATGTCACGGGTCAACGGC 183 114519 component (S77572) Rev: TCACGCGGACCTGGCCCG 184 lnmV 116494- Unknown Fwd: ATGAGCACCGACACGGAG 185 116132 Rev: TCAGGCCCACCAGTCGCG 186 lnmW 118041- 4-coumarate:CoA (T08074) Fwd: ATGACGGAACGGACGTTC 187 116491 ligase Rev: TCATGACGGGGCTCCTGT 188 lnmX 118780- Conserved, function (CAC04222) Fwd: ATGGCCGACACACTCCTC 189 118049 unknown Rev: TCAACCCACTATCTGGAA 190 lnmY 120239- Antobiotic efflux Mct Fwd: ATGACCGTCAGGACCGAC 191 118815 protein (AAD32747) Rev: TCAGGCGGCGGCGTCGGT 192 lnmZ 121638- Cytochrome P450 MycG Fwd: GTGAGCACCGAAGTGGAA 193 120436 hydroxylase (S51594) Rev: TCACCACTCGACGTGCAT 194 LnmZ′ 121757- Unknown Fwd: ATGACTCAGATGCGGATT 195 122161 Rev: CTAGGCAGCCCCGTCGGT 196 +1 122832- Conserve, function (C70555) Fwd: ATGGCGCCCGGCTCCGGC 197 122182 unknown Rev: TCAGCCCTTCCCGGCCGC 198 +2 123664- Thioesterase GrsT Fwd: GTGGACCGAGAGGGGAAC 199 122846 (P14686) Rev: TCAGAACGTCCGCTCGGC 200 +3 123898- Conserved, function (CAC18692) Fwd: ATGACCGGCACGCTCGTG 201 124935 unknown Rev: TCACCAACTGGTCCTGCT 202 +4 125516- Probable tetR-family HemR Fwd: ATGCCACCGCCTCCCCGA 203 124806 transcriptional regulator (BAA21913) Rev: TCAGATCAGGGCGCGCCG 204 +5 125637- Antibiotic resistance CarA Fwd: ATGCCTACGCAGATCAGC 205 127256 protein (AAC32027) Rev: TCAGACCCGGACGGCCTG 206 +6 127231- Probable VgbB Fwd: ATGCACCACAGGCCGTCC 207 128199 hydrolase/lactonase (AAC61670) Rev: CTAGAGCTCCATGCGCAG 208 +7 129971- ABC transporter VarM Fwd: ATGACCACCCACCCGAAC 209 128316 (BAA96297) Rev: TCACGGTGTCACCGCTTC 210 +8 131727- Adenosylhomocysteinase SahH Fwd: ATGCCCTCGCAGCCGCCC 211 130318 (CAB88907) Rev: TCAGTAGCGGTAGTGGTC 212 +9 132616- 5,10-methylenetetra- MetF Fwd: TTGAGCACGCTGCGCGAC 213 131705 hydrofolate reductase (O54253) Rev: TCAGCGGGCGGCTGCGAG 214 *Function assignment was based on Gapped-BLAST and PSI-BLAST (Ref.1) analysis. Term “Probable” indicates that the biochemical function of at least one significant homologue has been confirmed; term “Putative” indicates that the function of homologues is solely deduced from sequence similarity. _ORFs translated on the complementary strand of DNA are underlined.

This invention also provides the cloning and sequence of a 4′-phosphopantetheinyl transferase (PPTase) from S. atroolivaceus. The PPTase sequence from S. atroolivaceus is named as “lnm” in the amino acid pileup analysis illustrated in FIG. 5. The rest are PPTases from other peptide and/or polyketide producing microorganisms. The Lnm PPTase described herein facilitates engineered biosynthesis of the leinamycin NRPS-PKS genes for generation of chemical structural diversity.

Using the sequence information provided herein (e.g. primer sequences and PPTase sequence) the PPTase nucleic acids can be routinely isolated according to standard methods (e.g. PCR amplification).

III. Expression of Leinamycin (lnm) Gene Clusters, Modules, and Enzymatic Domains.

As indicated above, in one embodiment this invention provides novel NRPS and PKS genes for the efficient recombinant production of both novel and known polyketides, peptides, and polyketide/polypeptide hybrids by expressing them in vivo. In other embodiments, such syntheses are carried out in vitro. Even in vitro syntheses, however, typically utilize recombinantly expressed PKSs, NRPSs, or enzymatic domains thereof. Thus, it is frequently desirable to express protein components of the PKSs or NRPs described above.

Typically expression of the protein components of the pathway and/or of the products of the NRPS/PKS pathway is accomplished by placing the subject PKS or NRPS nucleic acid(s) in an expression vector, and transfecting a cell with the vector such that the cell expresses the desired product(s).

A) Isolation/Preparation of Nucleic Acids.

In one embodiment, this invention provides nucleic acids for the recombinant expression of a leinamycin. Such nucleic acids include isolated gene cluster(s) comprising open reading frames encoding polypeptides sufficient to direct the assembly of a leinamycin.

In other embodiments of this invention, modified leinamycins (e.g. leinamycin analogs), novel polyketides, polypeptides, and combinations thereof (polyketide/polypeptide hybrids) are created by modifying PKSs and/or NRPSs so as to introduce variations into known polymers synthesized by the enzymes. Such variations may be introduced by design, for example to modify a known molecule in a specific way, e.g. by replacing a single monomeric unit within a polymer with another, thereby creating a derivative molecule of predicted structure. Alternatively, variations can be made randomly, for example by making a library of molecular variants of a known polymer by systematically or haphazardly replacing one or more modules or enzymatic domains in a known PKS or NRPS with a collection of alternative modules or domains. Production of alternative/modified PKSs, NRPSs and hybrid systems is described below.

Using the primer and sequence information provided herein, one of ordinary skill in the art can routinely isolate/clone the leinamycin PKS and/or NRPS modules and/or enzymatic domains (ORPs) described herein. For example, the PCR primers provided in Table 2, above, can be used to amplify any of the orfs identified therein. Moreover, using the sequence informatidp for the leinamycin (lnm) gene cluster provided herein (see, e.g., SEQ ID NO:1), the design of other primers suitable of the amplification of individual ORFs, combinations of ORFs; genes, etc. is routine.

Typically such amplifications will utilize the DNA of an organism containing the requisite genes (e.g. Streptomyces atroolivaceus) as a template. Typical amplification conditions include a PCR mixture consisting of 5 ng of S atroolivaceus genomic or plasmid DNA as template, 25 pmoles of ech primers, 25 μM dNTP, 5% DMSO, 2 units of Taq polymerase, 1× buffer, with or without 20% glycerol in a final volume of 50 μL. PCR is carried out (e.g. on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI)) with a cycling scheme as follows: initial denaturing at 94° C. for 5 min, 24-36 cycles of 45 sec at 94° C., 1 min at 60° C., 2 min at 72° C., followed by additional 7 min at 72° C. One of skill will appreciate that optimization of such a protocol, e.g. to improve yield, etc. is routine (see, e.g., U.S. Pat. No. 4,683,202; Innis (1990) PCR Protocols A Guide to Methods and Applications Academic Press Inc. San Diego, Calif., etc). In addition, primer may be designed to introduce restriction sites and so facilitate cloning of the amplified sequence into a vector.

Using the information provided herein other approaches to cloning the desired sequences will be apparent to those of skll in the art. For example, the PKS or NRPS modules or enzymatic domains of interest can be obtained from an organism that expresses the same, using recombinant methods, such as by screening cDNA or genomic libraries, derived from cells expressing the gene, or by deriving the gene from a vector known to include the same. The gene can then be isolated and combined with other desired NRPS and/or PKS modules or domains, using standard techniques. If the gene in question is already present in a suitable expression vector, it can be combined in situ, with, e.g., other PKS and/or NRPS subunits, as desired. The gene of interest can also be produced synthetically, rather than cloned. The nucleotide sequence can be designed with the appropriate codons for the particular amino acid sequence desired. In general, one will select preferred codons for the intended host in which the sequence will be expressed. The complete sequence can be assembled from overlapping oligonucleotides prepared by standard methods and assembled into a complete coding sequence (see, e.g., Edge (1981) Nature 292:756; Nambair et al. (1984) Science 223: 1299; Jay et al. (1984) J. Biol. Chem. 259:6311). In addition, it is noted that custom gene synthesis is commercially available (see, e.g. Operon Technologies, Alameda, Calif.).

Examples of such techniques and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San Diego, Calif. (Berger); Sambrook et al. (1989) Molecular Cloning—A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 1994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Pat. No. 5,017,478; and European Patent No. 0,246,864.

B) Expression Vectors and Host Cells.

A wide variety of expression vectors and host cells are suitable for the synthesis of leinamycin or leinamycin analogues, or the expression of polypeptides comprising the leinamycin biosynthetic pathway.

The choice of vector depends on the sequence(s) that are to be expressed. Any transducible cloning vector can be used as a cloning vector for the nucleic acid constructs of this invention. However, where large clusters are to be expressed, phagemids, cosmids, P1s, YACs, BACs, PACs, HACs or similar cloning vectors can be used for cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for example, are advantageous vectors due to the ability to insert and stably propagate therein larger fragments of DNA than in M13 phage and lambda phage, respectively. Phagemids which will find use in this method generally include hybrids between plasmids and filamentous phage cloning vehicles. Cosmids which will find use in this method generally include lambda phage-based vectors into which cos sites have been inserted Recipient pool cloning vectors can be any suitable plasmid. The cloning vectors into which pools of mutants are inserted may be identical or may be constructed to harbor and express different genetic markers (see, e.g., Sambrook et al., supra). The utility of employing such vectors having different marker genes may be exploited to facilitate a determination of successful transduction.

In certain embodiments of this invention, vectors are used to introduce PKS, NRPS, or NRPS/PKS genes or gene clusters into host (e.g. Streptomyces) cells. Numerous vectors for use in particular host cells are well known to those of skill in the art. For example described in Malpartida and Hopwood, (1984) Nature, 309:462464; Kao et al., (1994), Science, 265: 509-512; and Hopwood et al., (1987) Metheds Enzymol., 153:116-166 all describe vectors for use in various Streptonyces hosts.

In a certain embodiment, Streptomyces vectors are used that include sequences that allow their introduction and maintenance in E. coli. Such Streptomyces/E. coli shuttle vectors have been described (see, for example, Vara et al., (1989) J. Bacteriol., 171:5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl. Acad. Sci. USA, 88: 8553-8557.)

S. atroolivaceus is sensitive to thiostrepton (Thi) and apramycin (Apr). Thus, in one preferred embodiment the pGM60 (Muth et al. (1989) Mol. Gen. Genet., 219: 341-348), vector carrying the Thi^(R) marker, and pKC1139 (Bierman et al. (1992) Gene, 116: 43-49) vector, carrying the Apr^(R) marker, are particularly well suited for expression of lnm nucleic acids. Introduction of plasmid DNA into S. atroolivaceus by either polyethyleneglycol (PEG)-mediated transformation of protoplasts (Hopwood et al. (1985) Genetic manipulation of Streptomyces: a laboratory manual., John Innes Foundation: Norwich, UK) or by conjugation from E. coli S17 (13ierman et al. (1992) Gene, 116: 4349) was successful, demonstrating the feasibility of manipulating Lnm biosynthesis in S. atroolivaceus in vivo.

The gene sequences, or fragments thereof, which collectively encode an lnm gene cluster, one or more ORFs, one or more lnm modules, or one or more lnm enzymatic domains of this invention, can be inserted into expression vectors, using methods known to those of skill in the art. Preferred expression vectors will include control sequences operably linked to the desired NRPS and/or PKS coding sequence or fragment thereof. Suitable expression systems for use with the present invention include systems that function in eukaryotic and prokaryotic host cells. However, as explained above, prokaryotic systems are preferred, and in particular, systems compatible with Streptomyces spp. are of particular interest. Control elements for use in such systems include promoters, optionally containing operator sequences, and ribosome binding sites. Particularly useful promoters include control sequences derived from PKS and/or NRPS gene clusters, such as one or more act promoters. However, other bacterial promoters, such as those derived from sugar metabolizing enzymes, such as galactose, lactose (lac) and maltose, will also find use in the present constructs. Additional examples include promoter sequences derived from biosynthetic enzymes such as tryptophan (trp), the beta-lactamase (bla) promoter system, bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter (U.S. Pat. No. 4,551,433), which do not occur in nature also function in bacterial host cells. In Streptomyces, numerous promoters have been described including constitutive promoters, such as ermE and tcmG (Shen and Hutchinson, (1994) J. Biol. Chem. 269: 30726-30733), as well as controllable promoters such as actI and actIII (Pleper et al., (1995) Nature, 378: 263-266; Pieper et al., (1995) J. Am. Chem. Soc., 117: 11373-11374; and Wiesmann et al., (1995) Chem. & Biol. 2: 583-589).

Other regulatory sequences may also be desirable which allow for regulation of expression of the PKS replacement sequences relative to the growth of the host cell. Regulatory sequences are known to those of skill in the art, and examples include those which cause the expression of a gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. Other types of regulatory elements may also be present in the vector, for example, enhancer sequences.

Selectable markers can also be included in the recombinant expression vectors. A variety of markers are known which are useful in selecting for transformed cell lines and generally comprise a gene whose expression confers a selectable phenotype on transformed cells when the cells are grown in an appropriate selective medium. Such markers include, for example, genes that confer antibiotic resistance or sensitivity to the plasmid. Alternatively, several polyketides are naturally colored and this characteristic provides a built-in marker for selecting cells successfully transformed by the present constructs.

The various lnm PKS and/or NRPS clusters or subunits of interest can be cloned into one or more recombinant vectors as individual cassettes, with separate control elements, or under the control of, e.g., a single promoter. The PKS and/or NRPS subunits can include flanking restriction sites to allow for the easy deletion and insertion of other PKS subunits so that hybrid PKSs can be generated. The design of such unique restriction sites is known to those of skill in the art and can be accomplished using the techniques described above, such as sitedirected mutagenesis and PCR.

Methods of cloning and expressing large nucleic acids such as gene clusters, including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well known to those of skill in the art (see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc. Natl. Acad. Sci. USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl. Acad. Sci. USA, 84: 4445-4449; Grim et al. (1994) Gene, 151: 1-10; Kao et al. (1994) Science, 265: 509-512; and Hopwood et al. (1987) Meth. Enzymol., 153: 116-166). In some examples, nucleic acid sequences of well over 100 kb have been introduced into cells, including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al., (1998) Genomics, 52: 1-8; Woon et al., (1998) Genomics, 50: 306-316; Huang et al., (1996) Nucl. Acids Res., 24: 4202-4209).

Host cells for the recombinant production of the leinamycin, leinamycin analogues, leinamycin shunt metabolites, lnm modules, orfs, or catalytic domains, and the like can be derived from any organism with the capability of harboring a recombinant PKS, NRPS or PKS/NRPS gene cluster. Thus, the host cells of the present invention can be derived from either prokaryotic or eukaryotic organisms. However, preferred host cells are those constructed from the actinomycetes, a class of mycelial bacteria that are abundant producers of a number of polyketides and peptides. A particularly preferred genus for use with the present system is Streptomyces. Thus, for example, S. verticillus S. ambofaciens, S. avermitilis, S. atroolivaceus, S. azureus, S. cinnamonensis, S. coelicolor, S. curacoi, S. erythraeus, S. fradiae, S. galilaeus, S. glaucescens, S. hygroscopicus, S. lividans, S. parvulus, S. peucetius, S. rimosus, S. roseofulvus, S. thermotolerans, S. violaceoruber, among others, will provide convenient host cells for the subject invention, with S. coelicolor being preferred (see, e.g., Hopwood, D. A and Sherman, D. H. Ann. Rev. Genet. (1990) 24:37-66; O'Hagan, D. The Polyketide Metabolites (Ellis Horwood Limited, 1991), for a description of various polyketide-producing organisms and their natural products). Two of the common problems associated with heterologous gene expression in E. coli are (1) the formation of inclusion bodies, which requires additional steps of solubilization and refolding and often leads to inactive enzymes, and (2) the inadequate posttranslational processing of the resulting enzymes. These problems can be circumvented by expressing genes of Streptomyces origin in Streptomyces as indicated above. In preferred embodiments, cloning is performed using E. coli-Streptomyces shuttle vectors (e.g. pWHM3, pWHM601, pANT1200, pANT1201, pGM60, pKC1139). Using shuttle vectors, most of the subclonings can be easily carried out in E. coli. As indicated above, particularly preferred vectors include pGM60, pKCl 139. . If controlled expression of the target gene is desired, pMR5 (McDaniel et al. (1993) Science, 262: 1546-1550) is a good choice since it carries the actI and actII promoter pair, the transcription of which is under the control of the actII-ORF4 transcriptional regulator. This vector has been used for the expression of various PKS (McDaniel et al. (1999) Proc. Natl. Acad. Sci., USA, 96: 1846-1851; Pieper et al. (1995) Nature, 378: 263-266) genes, including the Blm NRPS and PKS genes very recently by us, in S. coelicolor and S. lividans.

In addition, a new family of 4′-phosphopantetheine transferases has been recently identified, which catalyze the posttranslational modification by the covalent attachment of the 4′-phosphopantetheine moiety of CoA to a conserved serine residue of either the PCP domain of NRPS or the ACP domain of PKS (Gehring et al. (1997) Chem. Biol., 4: 17-24; Lambalot et al. (1996) Chem. Biol., 3: 923-936; Walsh et al. (1997) Curr. Opinion Chem. Biol. 1: 309-315). It is now possible to overproduce functional holo-NRPS or PKS either in vivo by co-expression of the NRPS or PKS gene and a 4′-phosphopantetheine transferase gene (Du and Shen (1999) Chem. Biol., 6: 507-517; Cox et al. (1997) FEBS Lett., 405: 267-272; Ku et al. (1997) Chem. Biol., 4: 203-207) or in vitro by phosphopantetheinylation of an apo-ACP or apo-PCP with CoA in the presence of a 4′-phosphopantetheine transferase (Du and Shen (1999) Chem. Biol., 6: 507-517; Cox et al. (1997) FEBS Lett., 405: 267-272). Both in vivo and in vitro methods have been established in our laboratory (Du and Shen (1999) Chem. Biol., 6: 507-517) and can be applied to the production of functional Lnm NRPS and PKS enzymes, should the need of posttranslational modification arise.

In certain embodiments this invention may make use of genetically engineered cells that either lack PKS and/or NRPS genes or have their naturally occurring PKS and/or NRPS genes substantially deleted These host cells can be transformed with recombinant vectors, encoding a variety of PKS and/or NRPS gene clusters, for the production of active polyketides. The invention provides for the production of significant quantities of product, e.g. a leinamycin, at an appropriate stage of the growth cycle. The leinamycin, leinamycin analogues, or other polyketide, peptide, or hybrid polyketideipeptide metabolites so produced can be used as therapeutic agents, to treat a number of disorders, depending on the type of metabolites in question.

The vectors and host cells described above can be used to express various protein components of the polyketide and/or polypeptide synthetic modules for subsequent isolation and/or to provide a biological synthesis of one or more desired biomolecules (e.g. leinamycin, leinamycin analogues, and the like). Where leinamycin, and/or leinamycin analogues, and/or one d more proteins of the leinamycin (lnm) cluster are expressed (e.g. overexpressed) for subsequent isolation and/or characterization, the proteins are expressed in any prokaryotic or eukaryotic cell suitable for protein expression. In one preferred embodiment, the proteins are expressed in E. coli. Overexpression of leinamycin in E. coli is described in Example 1.

In certain embodiments, a eukaryotic host cell is preferred (e.g. where certain glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and various myeloma cell lines)

C) Protein/Polyketide Recovery.

Polypeptide and/or polyketide recovery is accomplished according to standard methods well known to those of skill in the art. Thus, for example where lnm cluster proteins are to be expressed and isolated, the proteins can be expressed with a convenient tag to facilitate isolation (e.g. a His₆) tag. Other standard protein purification techniques are suitable and well known to those of skill in the art (see, e.g., Quadri et al. (1998) Biochemistry 37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet. 232: 313-321, etc.).

Similarly where components (e.g. modules and/or enzymatic domains) of the leinamycin cluster are used to express various biomolecules (e.g. polyketides, polypeptides, etc.) the desired product and/or shunt metabolite(s) are isolated according to standard methods well know to those of skill in the art (see, e.g., Carreras and Khosla (1998) Biochemistry, 37: 2084-2088; Deutscher (1990) Methods in Enzymology Volume 182: Guide to Protein Purification, M. Deutscher, ed., and the like). Hara et al. (1989) J. Antibiot. 42: 1768-1774 discloses and effective culture system that, with minor modifications was used to express leinamycin (see Example 1). Purification of expressed leinamycin is also described in Example 1.

D) Optimized Expression System

Four methods are typically used for introduction of plasmid DNA into Streptomyces species: PEG-mediated protoplast transformation, electroporation, conjugation, and phage infection. Standard protocols are available in the Streptomyces laboratory manual (Hopwood et al. (1985) Genetic manipulation of Streptomyces: a laboratory manual., John Innes Foundation: Norwich, UK), and several different transformation systems have been described for various Streptomyces species (Liu and Shen (2000) Antimicrob. Agents Chermother., 44; 382-392; Lichenstein et al. (1990) Gene, 88: 81-86; Zang et al. (1997) J. Ferment. Bioeng., 83: 217-221; Matsushima and Baltz (1985) J. Bacteriol., 162: 180-185; Garcia-Dominguez et al. (1987) App. Environ. Microbiol., 53: 1376-1381; Aidoo et al. (1990) J. Gen. Microbiol., 136: 657-662; Illing et al. (1989) J. Gen. Microbiol., 135: 2289-2297).

Example 1, describes a transfection system optimized for S. atroolivaceus. A conjugation approach was pursued, with surprising success. Spores (1×10⁹) are heat-shocked for 20 min at 42° C. instead of 10 min at 50° C., followed by incubation at 30° C. for up to 6 hours. The germination of spores can be monitored by microscopic checks every 30 min from 4 hours after heat-shock. E. coli S17-1 (bearing the desired construct) culture is freshly prepared and conjugation is conducted on modified ISP-4 medium. After incubation, e.g. at 28° C. for 5 days, apparent positive ex-conjugates are identified. We calculated conjugation/integration efficiency for a non-self-replicating construct pYC12 as approximately 1.8×10⁻⁸, and the conjugation efficiency for self-replicating construct was 5×10⁻⁷.

IV. Synthesis of Leinamycin and Leinamycin Analogues.

In one embodiment this invention provides methods of synthesizing leinamycins and recombinantly synthesized leinamycins. As indicated above, this is generally accomplished by providing an organism (e.g. a bacterial cell) containing sufficient components of the leinamycin gene cluster to direct synthesis of a complete leinamycin and/or leinamycin analogue.

In one embodiment, the entire leinamycin cluster, or a fragment thereof (e.g. designed to introduce a modification into the cluster through homologous recombination) is cloned into a Streptomyces strain (e.g., S. lividans, S. atroolivaceus, or S. coelicolor). Kao et al.(1994) Science, 265: 509-512, have cloned the 30 kb DEBS genes from Sacc. erytlhmea into S. coelicolor and produced 6-deoxyerythronolide B in S. coelicolor and these methods can be used construct an expression plasmid for heterologous expression of the leinamycin cluster. This method involves the transfer of DNA between a temperature-sensitive plasmid and a shuttle vector by means of a homologous double recombination event in E. coli (Sssio et al., (2000) Nature Biotechnol. 18: 343-345).

In one preferred embodiment, the two ends spanning the leinamycin cluster or fragment thereof, or recombinant construct, are cloned into a temperature-sensitive plasmid that is chloramphenicol resistant (CM^(R)) such as pCK6. Streptomyces DNA is then rescued from a donor into the temperature-sensitive recipient by co-transforming E. coli with the Cm^(R) recipient plasmid and the apramycin resistant (Ap^(R)) pKC505 donor cosmid that contains the desired construct, followed by chloramphenicol and apramycin selection at 30° C. Colonies harboring both plasmids (Cm^(R), Ap^(R)) will be shifted to 44° C. on chloramphenicol and apramycin plates and only those cointegrates formed by a single recombination event between the two plasmids are viable. Surviving colonies are then propagated at 30° C. on Cm^(R) plates to select for recombinant plasmids formed by the resolution of cointegrates through a second recombinant event. The desired construct is cloned into the Cm^(R) temperature-sensitive plasmid and is ready to be moved into any expression plasmid by a simar means of homologous recombinant event.

Another system illustrated in Example 1, exploits the fact that S. atroolivaceus grows very well at 30° C.; and doesn't grow at temperature beyond 35° C. In addition, it is highly sensitive to both aprbmycin (Am) and thiostrepton (Thio). Thus, E. coli-Streptomyces shuttle vectors pOJ260 (suicide vector, Am^(R)) and pKC1139 (self-replicating vector, Am^(R)) can be used to make the desired lmp or lmp-modification constructs.

The methods and constructs of this invention can be used to alter expression of endogenous leinamycin. Using the lnm gene cluster information provided herein, one of skill in the art may regulate the synthesis of endogenous leinamycin. In particular, the expression of various ORFs comprising the lnm gene cluster may be increased or decreased to alter leinamycin synthesis levels.

Methods of altering the expression of endogenous genes are well known to those of skill in the art. Typically such methods involve altering or replacing all or a portion of the regulatory sequences controlling expression of the particular gene that is to be regulated. In a preferred embodiment, the regulatory sequences (e.g., the native promoter) upstream of one or more of the lnm ORFs are altered.

This is typically accomplished by the use of homologous recombination to introduce a heterologous nucleic acid into the native regulatory sequences. To downregulate expression of one or more lnm ORFS, simple mutations that either alter the reading frame or disrupt the promoter are suitable. To upregulate expression of the lnm ORF(s) the native promoter(s) can be substituted with heterologous promoter(s) that induce higher than normal levels of transcription. In a particularly preferred embodiment, nucleic acid sequences comprising the structural gene in question or upstream sequences are utilized for targeting heterologous recombination constructs. The use of homologous recombination to alter expression of endogenous genes is described in detail in U.S. Pat. No. 5,272,071, WO 91/09955, WO 93/09222, WO 96/29411, WO 95/31560, and WO 91/12650.

In addition, or alternatively, constructs can be introduced that express particular ORF at higher levels than in the wildtype organism. For example, leinamycin production yield improvement by engineering leinamycin biosynthesis has been demonstrated using lnmG as an example. LnmG is a di-domain protein with amino acid sequence homology to acyltransferase (AT) (the 1st domain) and enoyl reductase (R) (the 2nd domain). Inactivation of lnmG yields an S. atroolivaceus mutant strain whose ability to produce leinamycin is completely abolished. This result unambiguously establishes that lnmG, and thereby the cloned gene cluster, encodes leinamycin production.

Introduction of an lnmG overexpression plasmid into the S. atroolivaceus lnmG mutant not only restores its ability to produce leinamycin but also result in an overproduction of leinamycin in comparison with the S. atroolivaceus strain. Thus as shown in FIG. 6, S. atroolivaceus lnmG mutant transformed with a low-copy-number (10) plasmid in which the expression of InG is under the control of the ermE promoter produces similar level of leinamycin as the wild type S. atroolivaceus strain. S. atroolivaceus lnmG mutant transformed with a medium-copy-number (300) plasmid in which the expression of lnmG is under the control of the ernE promoter produces 3-5 fold more leinamycin than the wild type S. atroolivaceus strain.

In one certain embodiment, this invention provides methods of synthesizing modified leinamycins or leinamycin analogs. Typically, in such embodiments, the leinamycin analogs are synthesized either by introducing specific perturbations into individual NRPS and/or PKS enzymatic domains or modules, or by reprogramming the linear order in which the NRPS or PKS enzymatic domains and/or modules appear in the leinamycin gene cluster. The former will lead to leinamycin analogs with targeted modifications at the leinamycin backbone and the latter will allow incorporation of other extension units in variable sequence into the biosynthesis of leinamycin.

In preferred embodiments modification of the lnm gene cluster to yield leinamycin analogues is accomplished by one of two different approaches. In one approach, the lnm enzymatic domains and/or modules modules are altered in a directed manner (i.e. they are changed in a preselected way), while in another approach, random/haphazard alterations are introduced into the lnm cluster and the resulting products are screened to identify those with desired properties.

A) Synthesis of Leinamycin Analogs by Specific Engineering of the lnm Genes.

The lnm genes can be re-engineered by means of specific mutations or by reprogramming the linear order of the NRPS or PKS enzymatic domains or modules. In this approach, a wild-type lnm allele (ORF) is replaced (ore recombined) with a mutant construct containing various lnm ORFs in a different order. These mutants are introduced into and and expressed in an appropriate host (e.g., Streptomyces or in a heterologous host). Since both NRPSs (Stachelhaus et al. (1995) Science, 269: 69-72) and PKSs (Donadio et al. (1993) Proc. Natl. Acad Sci. USA, 90: 7119-7123, Donadio et al. (1995) J. Am., Chem. Soc., 117: 9105-9106, Cortes et al. (1995) Science, 268: 1487-1489) have shown considerable tolerance to reprogranmming, it is expected that these modifications of the lnm cluster will result in the production of leinamycin analogs with predicted structural alterations.

Using this approach, rational manipulations of genes governing Lnm biosynthesis allow preparation of novel Lnm analogs that are extremely difficult to prepare by chemical modifications or that present a formidable task by total synthesis. Examples of such analogs include 8-dehydroxyl-Lnm, 6-demethyl-Lnm, and 8-dehydroxyl-6-demethyl-Lnm (FIG. 7), which can be generated inactivating the genes encoding the C-8 hydroxylases, such as lnmA, lnmB, or lnmZ, and the MT domain of PKS-4 of lhmJ encoding the C-6 methyl transferase individually or both, respectively (FIG. 4, FIG. 7).

The stability of Lnm under aqueous condition depends on the pH: t_(1/2)>20 hr at pH 6, t_(1/2)=8 hr at pH 7, and t_(1/2)<1 hr at pH 8 (Asai et al. (1997) Bioorg. Med. Chem. 5: 723-729). In the presence of a thiol, Lnm exhibits an even shorter t_(1/2) and is inactivated by degradation to form two major adducts A and B (FIG. 8). Removal of the 8-hydroxyl group as in 8-dehydroxyl-Lnm should therefore eliminate the formation of adduct B, effectively enhancing the concentration of the active form of episulfonium ion. Adduct A results from nucleophilic attack of the episulfonium ion by H₂O, instead of DNA—leading to alkylative DNA cleavage. Since the electrophilic center (C-6) of the episulfonium ion is a 3° carbon in Lnm, the nucleophilic attack of the latter by H₂O or the —NH₂ group of DNA likely proceeds via a S_(N)1 mechanism. Consequently, little discrimination between the two nucleophiles are observed (Asai et al. (1997) Bioorg. Med. Chem. 5: 723-729). In contrast, 6-dimethyl-Lnm will generate an episulfonium with a 2° carbon as the electrophilic center. The latter intermediate should be more stable and is likely to alkylate DNA via a S_(N)2 mechanism. Consequently, being a much stronger nucleophile than H₂O, the —NH₂ group of DNA should be selectively alkylated, leading to DNA cleavage, and H₂O may be not a stronger enough nucleophile to attack the episulfonium to form the inactive adduct A (at least competed unfavorably in the presence of DNA). Finally 8-dehydroxyl-6-demethyl-Lnm should exhibit both an improved stability and selectivity towards DNA cleavage, serving as a good drug candidate (FIG. 8).

Other preferred embodiments contemplate the synthesis of Lnm analogs with an oxazole or bithiazole moiety—oxazolyl-Lnm or dithiazolyl-Lnm, respectively (FIG. 7). The former could be prepared by replacing the cysteine-specific A domain at NRPS-2 with a serine-specific A domain (Shen et al. (1999) Bioorg. Chem., 27: 155-171), and the latter could be effected by replacing the NRPS-2 module with the bithiazole-forming NRPS modules from the bleomycin gene cluster (see PCT/US00/00445). Since it is known that oxazole and thiazole play an important role in drug-DNA recognition (Li et al. (1996) Science 274: 1188-1193; Roy et al. (1999) Nat. Prod. Rep., 16: 249-263), it is reasonable to assume that these novel oxazole- or bithiazole-containing Lnm analogs may exhibit improved efficacy as anticancer agents. These five examples are only representatives of the types of Lmn analogs that could be prepared by rational engineering of the Lnm NRPS/PKS genes, with the choice being influenced by considerations of mechanism of action for Lnm. We envisage that various other permutations can be introduced into Lnm by genetic manipulation of the lnm NRPS and PKS genes.

Engineering of both NRPS and PKS by either domain or module swamping has been very successfud for making novel peptides (Stachelhaus et al. (1995) Science, 269: 69-72; Belshaw et al. (1999) Science, 284: 486489; Stachelhaus et al. (1999) Chem. Biol. 6: 493-505; de Ferra et at (1997) J. Biol. Chem., 272: 25304-25309; Elsner et al. (1997) J. Biol. Chem., 272: 4814-4819; Schneider et al (1998) Mol. Gen. Genet., 257: 308-318; Stachelhaus and Marahiel (1996) Bioclenm Pharmacol., 52: 177-186) and polyketides (Donadio et al. (1993) Proc. Natl. Acad. Sci., USA 90: 7119-7123; Cortes et al. (1995) Science, 268: 1487-1489; Kao et al (1994) Science, 265: 509-512; Kao et al. (1994) J. Am Chem. Soc., 116: 11612-11613; Kao et al (1995) J. Am. Chem. Soc., 117: 9105-9106; McDaniel et al. (1997) J. Am. Chem. Soc., 119: 4309-4310; Pieper et al. (1997) Biochemistry, 36: 1846-1851; Bedford et al. (1996) Chem. Biol., 3: 827-831; Oliynyk et al. (1996) Chem. Biol., 3: 833-839; McDaniel et al. (1999) Proc. Natl. Acad. Sci., USA, 96: 1846-1851; Jacobsen et al (1997) Science, 277: 367-369; Gokhale et al (1999) Science, 284: 482-485; Ruan et al. (1997) J. Bacteriol., 179: 6416-6425; Stassi et al (1998) Proc. Natl. Acad. Sci., USA, 95: 7305-7309) with the desired structural alterations. Domain or modules boundaries for both NRPS and PKS are well defined, although the effectiveness of individual domain or module swamping experiment is preferably empiricly determined.

In certain preferred embodiments, macrolactam products are isolated and subjected to mass spectrum and 1-D and 2-D NMR analyses to determine inactivation of lnmA, lnmB, lnmZ, and/or the MT domain of PKS-3 in lnmJ, respectively, has resulted in the production of 8-deoxyl-LNM, or 6-demethyl-Lnm, respectively. Similar strategy is used to carry out the double inactivation of the hydroxylase and methyl transferase genes for the producto nfo 8-dehydroxyl-6-demethyl-Lnm, as well as the desired domain or module replacement of the Lnm NRPS-2 to construct S. atroolivaceus recombinant strains producing oxazolyl-Lnm and dithiazoly-Lnm.

Although in vivo in vivo manipulation of Lnm biosynthesis in S. atroolivaceus has been demonstrated as feasible herein, in certain embodiments, methods to clone the entire lmn gene cluster into e.g., S. lividans or S. coelicolor—either by the newly developed E. coli-Streptomyces artificial chromosome (Sosio et al. (2000) Nature Biotechnol. 18: 343-345) or by the multi-plasmid approach (Tang et al. (2000) Science, 87: 640-642; Xue et al. (1999) Proc. Natl. Acad. Sci., USA, 96: 11740-11745) (up to three compatible Streptomyces vectors) are available and can be used.

Production of novel leinamycins by engineering leinamycin biosynthesis has been demonstrated with lnmH as an example. LnmH is a protein of unknown function on the basis of amino acid sequence analysis. Inactivation of lnmH yields an S. atroolivaceus mutant that no longer produces leinamycin but accumulates at least two new leinamycin metabolites upon HPLC analysis (see FIGS. 9). The production of these new metabolites results exclusively from the inactivation of ORF43. Complementation of the lnmH mutant by overexpression of lnmH under the ermE* promoter in a low-copy-number plasmid restores the leinamycin production to the mutant strain with the same metabolite profile as the wild type S. atroolivaceus strain.

B) Synthesis of Leinamycin Analogs by “Random” Modification of lnm Genes.

Leinamycin analogs can also be synthesized by randomly/haphazardly altering genes in the lnm cluster expressing the products of the randomly modified megasynthetase and then screening the products for the desired activity. Methods of “randomly” altering lnm cluster genes are described below.

V. Generafion of Other Synthetic Systems.

in addition to the production of leinamycin or modified leinamycins, the leinamycin gene cluster or elements thereof can be used by themselves or in combination with NRPS and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems to produce a wide variety of compounds including, but not limited to various polyketides, polypeptides, polyketide/polypeptide hybrids, various oxazoles and thiazoles, various sugars, various methylated polypeptides/polyketides, and the like. As with the production of modified leinamycins described above, such compounds can be produced, in vivo or in vitro, by catalytic biosynthesis using large, modular PKSs, NRPSs, and hybrid PKS/NRPS systems. The megasynthetases directing such syntheses can be rationally designed e.g. by predetermined alteration/modification of polyketide and/or polypeptide and/or hybrid PKS/NRPS pathways. Alternatively, large combinatorial libraries of cells harboring various megasynthetases can be produced by the random modification of particular pathways and then selected for the production of a molecule or molecules of interest. It will be appreciated that, in certain embodiments, such libraries of megasynthetases/modified pathways, can be used to generate large, complex combinatorial libraries of compounds which themselves can be screened for a desired activity.

A) Directed Modification of Biomolecules.

Elements, (e.g. open reading frames) of the leinamycin biosynthetic gene cluster and/or variants thereof can be used in a wide variety of “directed” biosynthetic processes (ie. where the process is designed to modify and/or synthesize one or more particular preselected metabolite(s)). Polypeptides encoded by particular open reading frames or combinations of open reading frames can be utilized to perform particular chemical modifications of biological molecules.

Thus, for example, open reading frames encoding a polypeptide synthetase can be used to chemically modify an amino acid by coupling it to another amino acid. One of skill in the art, utilizing the information provided here, can perform literally countless chemical modifications and/or syntheses using either “native” leinamycin biosynthesis metabolites as the substrate molecule, or other molecules capable of acting as substrates for the particular enzymes in question. Other substrates can be identified by routine screening. Methods of screening enzymes for specific activity against particular substrates are well known to those of skill in the art.

The biosyntheses can be performed in vivo, e.g. by providing a host cell comprising the desired leinamycin gene cluster open reading frame(s) and/or in vivo, e.g., by providing the polypeptides encoded by the leinamycin gene cluster orfs and the appropriate substrates and/or cofactors.

B) Directed Engineering of Novel Synthetic Pathways.

In numerous embodiments of this invention, novel polyketides, polypeptides, and combinations thereof are created by modifying known PKSs or NRPSs so as to introduce variations into known polymers synthesized by the enzymes. Such variations may be introduced by design, for example to modify a known molecule in a specific way, e.g. by replacing a single monomeric unit within a polymer with another, thereby creating a derivative molecule of predicted structure. Such variations can also be made by adding one or more modules to a known PKS or NRPS, or by removing one or more module from a known PKS or NRPS. Such novel PKSs or NRPSs can readily be made using a variety of techniques, including recombinant methods and in vitro synthetic methods.

Using any of these methods, it is possible to introduce PKS domains into a NRPS, or vice versa, thereby creating novel molecules including both peptide and polyketide structural domains. For example, a PKS enzyme producing a known polyketide can be modified so as to include an additional module that adds a peptide moiety into the polyketide. Novel molecules synthesized using these methods can be screened, using standard methods, for any activity of interest, such as antibiotic activity, effects on the cell cycle, effects on the cytoskeleton, etc.

Novel polyketides, polypeptides, or combinations thereof can also be made by creating novel PKSs or NRPSs de novo, using recombinant or in vitro synthetic methods. Such novel arrangements of domains can be designed, i.e. to create a specific polymer. In addition to creating novel PKSs or NRPSs by combining modules, the methods of this invention can also be used to make novel modules that can add new monomeric units to a growing polypeptide or polyketide chain. Because the identity of each module, and, consequently, the identity of the monomer added by the module, is determined by the identity and number of the functional domains comprising the module, it is possible to produce novel monomeric units by creating novel combinations of functional domains within a module. Such novel modules can be created by design, for example to make a specific module that will add a specific monomer to a polyketide or polypeptide, or can be created by the random association of domains so as to produce libraries of novel modules. Such novel modules can be made using recombinant or in vitro synthetic means.

Mutations can be. made to the native NRPS and/or PKS subunit sequences and such mutants used in place of the native sequence, so long as the mutants are able to function with other PKS and/or PKS subunits to collectively catalyze the synthesis of an identifiable polyketide and/or polypeptide. Such mutations can be made to the native sequences using conventional techniques such as by preparing synthetic oligonucleotides including the mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS subunit using restriction endonuclease digestion. (see, e.g., Kunkel, (1985) Proc. Natl. Acad. Sci. USA 82: 448; Geisselsoder et al. (1987) BioTechniques 5: 786). Alternatively, the mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a temperature below the melting temperature of the mismatched duplex. The primer can be made specific by keeping primer length and base composition within relatively narrow limits and by keeping the mutant base centrally located (Zoller and Smith (1983) Meth, Enzymol. 100: 468). Primer extension,is effected using DNA polymerase, the product cloned and clones containing the mutated DNA, derived by segregation of the primer extended strand, selected. Selection can be accomplished using the mutant primer as a hybridization probe. The technique is also applicable for generating multiple point mutations (see, e.g., Dalbie-McFarland et al. (1982) Proc. Natl. Acad. Sci. USA 79:6409). PCR mutagenesis will also find use for effecting the desired mutations.

C) Random Modification of PKS/NRPS Pathways.

In another embodiment, variations can be made randomly, for example by making a library of molecular variants of a known polymer by randomly mutating one or more PKS or NRPS modules and/or enzymatic domains or by randomly replacing one or more modules or enzymatic domains in a known PKS or NRPS with a collection of alternative modules and/or enzymatic domains.

The PKS and/or NRPS modules can be combined into a single multi-modular enzyme, thereby dramatically increasing the number of possible combinations obtained using these methods. These combinations can be made using standard recombinant or nucleic acid amplification methods, for example by shuffling nucleic acid sequences encoding various modules or enzymatic domains to create novel arrangements of the sequences, analogous to DNA shuffling methods described in Crameri et al., (1998) Nature 391: 288-291, and in U.S. Pat. Nos. 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro, for example by combinatorial synthetic methods. Novel polymers, or polymer libraries, can be screened for any specific activity using standard methods.

Random mutagenesis of the nucleotide sequences obtained as described above can be accomplished by several different techniques known in the art, such as by altering sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing synthetic mutants or by damaging plasntid DNA in vitro with chemicals. Chemical mutagens include, for example, sodium bisulfite, nitrous acid, hydroxylamine, agents which damage or remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. coli and propagated as a pool or library of mutant plasmids.

Large populations of random enzyme variants can be constructed in vivo using “recombination-enhanced mutagenesis.” This method employs two or more pools of, for example, 10⁶ mutants each of the wild-type encoding nucleotide sequence that are generated using any convenient mutagenesis technique, described more fully above, and then inserted into cloning vectors.

D) Incorporation and/or Modification of Non-lnm Cluster Elements.

In either the directed or random approaches, nucleic acids encoding novel combinations of modules and/or enzymatic are introduced into a cell. In one embodiment, nucleic acids encoding one or more PKS or NRPS domains are introduced into a cell so as to replace one or more domains of an endogenous PKS or NRPS within a chromosome of the cell. Endogenous gene replacement can be accomplished using standard methods, such as homologous recombination. Nucleic acids encoding an entire PKS, NRPS, or combination thereof can also be introduced into a cell so as to enable the cell to produce the novel enzyme, and, consequently, synthesize the novel polymer. In a preferred embodiment, such nucleic acids are introduced into the cell optionally along with a number of additional genes, together called a ‘gene cluster,’ that influence the expression of the genes, survival of the expressing cells, etc. In a particularly preferred embodiment, such cells do not have any other PKS- or NRPS-encoding genes or gene clusters, thereby allowing the straightforward isolation ofThe polymer synthesized by the genes introduced into the cell.

Furthermore, the recombinant vector(s) can include genes from a single PKS and/or NRPS gene cluster, or may comprise hybrid replacement PKS gene clusters with, e.g., a gene for one cluster replaced by the corresponding gene from another gene cluster. For example, it has been found that ACPs are readily interchangeable among different synthases without an effect on product structure. Furthermore, a given KR can recognize and reduce polyketide chains of different chain lengths. Accordingly, these genes are freely interchangeable in the constructs described herein. Thus, the replacement clusters of the present invention can be derived from any combination of PKS and/or NRPS gene sets that ultimately function to produce an identifiable polyketide and/or peptide.

Examplels of hybrid replacement clusters include, but are not limited to, clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster, frenolicin (fren), granaticin (gra), tetracenomycin (tcm), 6-methylsalicylic acid (6-msas), oxytetracycline (otc), tetracycline (tet), erythromycin (ery), griseusin (gris), nanaomycin, medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a discussion of various PKss, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited.

A number of hybrid gene clusters have been constructed, having components derived from the act, fren, tcm, gris and gra gene clusters (see, e.g., U.S. Pat. No. 5,712,146). Other hybrid gene clusters, as described above, can easily be produced and screened using the disclosure herein, for the production of identifiable polyketides, polypeptides or polyketide/polypeptide hybrids.

Host cells (e.g. Streptomyces) can be transformed with one or more vectors, collectively encoding a functional PKS/NRPS set (e.g. a leinamycin or leinamycin analog), or a cocktail comprising a random assortment of PKS and/or NRPS genes, modules, active sites, or portions thereof. The vector(s) can include native or hybrid combinations of PKS and/or NRPS subunits or cocktail components, or mutants thereof. As explained above, the gene cluster need not correspond to the complete native gene cluster but need only encode the necessary PKS and/or NRPS components to catalyze the production of the desired product. For example, in Streptomyces aromatic PKss, carbon chain assembly requires the products of three open reading frames (ORPs). OR.F encodes a ketosynthase (KS) and an acyltransferase (AT) active site (KS/AT); ORF2 encodes a chain length-determining factor (CLF), a protein similar to the ORF1 product but lacking the KS and AT motifs; and ORF3 encodes a discrete acyl carrier protein (ACP). Some gene clusters also code for a ketoreductase (KR) and a cyclase, involved in cyclization of the nascent polyketide backbone. However, it has been found that only the KS/AT, CLF, and ACP, need be present in order to produce an identifiable polyketide. Thus, in the case of aromatic PKSs derived from Streptomyces, these three genes, without the other components of the native clusters, can be included in one or more recombinant vectors, to constitute a “minimal” replacement PKS gene cluster.

E) Variation of Starter and Extender Units.

In addition to varying the PKS and/or NRPS modules and/or domains, variations in the products produced by various PKS/NRPS systems can be obtained by varying the starter units and/or the extender units. Thus, for example, a considerable degree of variability exists for starter units, e.g., acetyl CoA, maloamyl CoA, propionyl CoA, acetate, butyrate, isobutyrate and the like. In addition, naturally occurring PKSs and/or NRPSs have shown some tolerance for varying extender units.

F) Screening of Products.

Particularly where large combinatorial libraries are synthesized, e.g. using one or more modules and/or enzymatic domains of the lnm gene cluster it will often be desired to screen the resulting compound(s) for the desired activity. Methods of screening compounds (e.g. polypeptides, polyketides, sugars, thiazoles, etc.) for various activities of interest (e.g. cytotoxicity, antimicrobial activity, particular chemical activities, etc.) are well known to those of skill in the art.

Where large numbers of compounds are produced, it is often desired to rapidly screen such compounds using “high throughput systems” (HTS). High throughput assays systems are well known to those of skill in the art and many such systems are commercially available. (see, e.g., Zymark Corp., Hopkinton, Mass.; Air Technical Industries, Mentor, Ohio; Beckman Instruments, Inc. Fullerton, Calif.; Precision Systems, Inc., Natick, Mass., etc.). These systems typically automate entire procedures including all sample and reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate(s) in detector(s) appropriate for the assay. These configurable systems provide high throughput and rapid start up as well as a high degree of flexibility and customization. The manufacturers of such systems typically provide detailed protocols for the various high throughput screens.

VI. In Vitro Syntheses.

In additional embodiments of this invention, leinamycins and other polyketides and/or polypeptides are synthesized and/or modified in vitro. Individual enzymatic domains or modules can be used in vitro to modify a unit and/or to add a single monomeric unit to a growing polyketide or polypeptide chain. In one approach a metasynthetase providing all the desired synthetic activities recombinantly expressed and then provided, the appropriate substrates and buffer system e.g. in a bioreactor, to direct the synthesis of the desired product. In another approach, various PKSs and/or NRPSs are provided in different solutons and the growing polymer chains can be sequentially introduced into the plurality of solutions, each containing a single (or several) PKS or NRPS modules. In still another embodiment, the PKS and/or NRPS modules or enzymatic domains are provided attached to a solid support and a fluid containing the growing macromolecule is passed over the surface whereby the PKSs or NRPSs are able to react with the target substrate.

In one preferred embodiment, a combinatorial library of polyketides or polypeptides, or combinations thereof, is created by using automated means to facilitate the sequential introduction of a multitude of polymeric chains, each attached to a solid support, to a collection of solutions, each containing a single PKS or NRPS module. These automated means can be used to systematically vary the sequence by which each polymeric chain is introduced into the various solutions, thereby creating a combinatorial library. Numerous methods are well known in the art to create combinatorial libraries of molecules by the sequential addition of monomeric units, for example as described in WO 97/02358.

VIII. Kits.

In still another embodiment, this invention provides kits for practice of the methods described herein. In one preferred embodiment, the kits comprise one or more containers containing nucleic acids encoding one or more of the Inn gene cluster ORFs and/or one or more of the inm PKS or NRPS modules or enzymatic domains. Certain kits may comprise vectors encoding the lnm orfs and/or cells containing such vectors. The kits may optionally include any reagents and/or apparatus to facilitate practice of the assays described herein. Such reagents include, but are not limited to buffers, labels, labeled antibodies, bioreactors, cells, etc.

In addition, the kits may include instructional materials containing directions (i.e., protocols) for the practice of the methods of this invention. Preferred instructional materials provide protocols utilizing the kit contents for creating or modifying lnm module or ORF and/or for synthesizing or modifying a molecule using one or more lnm modules and/or enzymatic domains. While the instructional materials typically comprise written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media include, but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the like. Such media may include addresses to internet sites that provide such instructional materials.

EXAMPLES

The following examples are offered to illustrate, but not to limit the claimed invention.

Example 1 The Biosynthetic Gene Cluster for the Antitumor Macrolactam Leinamycin from Streptomyces atroolivaceus: a Novel Approach for Identifying and Cloning Thiazole Biosynthetic Genes

In this example, we describe (1) the construction of a cosmid library of S. atroolivaceus, (2) PCR amplification of a conserved cyclase-domain probe, (3) identification and mapping of overlapping cosmid clones that cover the target Lnm biosynthetic gene cluster and flanking regions, (4) the purification, EPLC and mass spectral (MS) analyses of Lnm production in S. atrooliveceus, (5) the sequencing and sequence analysis of a 11 kb DNA from the gene cluster, (6) the development of a genetic system, and (7) the confirmation of cloned gene cluster encoding Lnm biosynthesis by gene disruption to generate Lnm non-producing mutants.

Materials and Methods

Genomic DNA Isolation.

Standard protocols are followed in this study (Kieser et al. (2000) Practical Streptonzyces genetics. The John Innes Foundation, Norwich, England; Sambrook et al. (1989) Molecular Cloning, A Laboratory Manual (2nd ed.). Cold Spring Harbor Laboratory Press).

Cosmid Library Construction.

(1) Cosmid vector pOJ446: The ready-for-use pOJ446 was from lab stock (Liu and Shen (2000) Antinticrob Agents Chemother. 44: 382-392; Smith et al. (2000) Antimicrob Agents Chemother. 44: 1809-1817; Du et al. (2000) Chem. Biol. 7: 623-642). The sample contains the HpaI-digested, dephosphorylated, and then BamHI-digested pOJ446, ie. the mixture of a ˜2 kb arm and a ˜8 kb arm.

(2) Preparation ofparially digested DNA inserts: Genonic DNA of S. atroolivaceus was partially digested by MboI (New England Biolabs, MA) following the standard protocol (Rao et al. (1987) Methods in Enzymol. 153: 166-198; Sambrook et al., supra). A total of 50 μg of DNA was diluted to 900 μl with 10 mM Tris-HCl (pH 8.0) buffer, and 100 μl of 10× buffer was added. Let the DNA solution sit on ice for at least 2 hrs. Label 10 1.5-ml eppendorf tubes and keep on ice. Aliquot 40 μl of DNA solution to tube #1; aliquot 20 μl of DNA solution to the rest tubes (#2 to #10). Ten units (1 μl) of MboI was added to tube #1 and mixed well on ice. Twenty id of mixture #1 was transferred to tube #2 and mixed. Twenty μl of mixture #2 was transferred to tube #3 and mixed, and so on until tube #9. Discard 20 μl of mixture #9. No enzyme was added to tube #10. Incubate the 10 reaction mixtures in 37° C. water bath for 30 min. Stop reaction by adding 2 μl of 0.2 M EDTA (pH 8.0; final concentration 5 mM). Heat reactions at 70° C. for 15 min to inactivate enzyme. Add 4 μl 6×DNA loading buffer to each tube and run 10 μl of each sample in a 0.3% gel. Use intact λ DNA as control. Run gel slowly for at least 4 hrs. Use the optimal conditions for large-scale DNA partial digestion as follow: 30 μg DNA with 0.6 unit of MboI in 300 μl volume. Partially digested DNA solution was diluted with equal volume of ddH2O and extracted once with phenol:chloroform, once with chloroform, and precipitated ethanol. Redissolve DNA in 180 μl ddH2O and treated with 10 units of CIP (NEB) at 37° C. for 4 hrs. Perform again phenol: chloroform extraction and ethanol precipitation. Finally redissolve DNA in 10 μl ddH₂O (˜2 μg/ul).

(3) Vector-insert ligation: Five μl of MboI partially digested, dephosphorylated S. atroolivaceus DNA fragment (˜10 μg) was ligated with 2 μl of HpaI-cut, dephosphorylated and BamHI digested pOJ446 (˜4 μg) in the volume of 10 μl with 100 units of T4 DNA ligase (NEB) at 16° C. for overnight.

(4) Packaging of the ligation mixture: Four μl of ligation mixture was packaged with one vial of Gigapack III XL extract (Stratagene, Calif.) according to manufacturer's instruction. Five hundred μl of SM buffer containing the packaged phage particles was obtained and stored at 4° C.

(5) Determination of the titer of the cosmid library and library amplification: Fresh host cell E. coli XL1-Blue MRF′ suspension in 10 mM MgSO4 was prepared according to manufacture's instruction (Stratagene). Twenty five μl of 1:10 and 1:50 diluted phage particle solutions were mixed with 25 μl of host cells (OD₆₀₀=0.5) and incubated at 37° C. for 20 min. Four hundred fifty μl of LB medium was added to each sample and incubated at 37° C. for 1 hr with gently shaking. Aliquots (50 μI, 450 μl) of each sample were spread onto φ150 mm LB plates (with 100 μg/ml apramycin). Plates were incubated at 37° C. overnight. The number of colonies was counted and the titer of the cosmid library was calculated. Cosmid DNA was prepared from 16 random clones, digested with BamHI and fragments were separated in a 0.8% gel to test the quality of library. An aliquot of the primary phage stock equivalent to 3 times of the genome size of S. atroolivaceus was transducted into XL1 -Blue MRF′ cells and grown on LB plates with 100 μl/ml apramycin. Cells of the colonies were collected into LB medium with apramycin. Sterile glycerol was added to a final concentration of 20% and stored at −80° C. as the permanent cosmid library stock.

Degenerate PCR Primers.

Primer set for cyclase-domain amplification were: CyFP (5′-GCGCCACGAGCCGTTYCCNYTNAC-3′, SEQ ID NO:215) and CyRP (5′-GCCCAGGTTGGAGGTGARSACNACNGG-3′, SEQ ID NO:216); primer set for oxidase-domain amplification were: OXFP (5′-GCCGGCTCCACCTACCCNGTNCARAC-3′, SEQ ID NO:217) and OxRP (5′-CATCAGCAGCTGGGTCATGKMNCCNGCYTC-3′, SEQ ID NO:218). Tiese primers were designed by the “Consensus-Degenerate Hybrid Oligonucleotide Primer” strategy (Rose et al. (1998) Nucleic Acids Res. 26: 1628-1635) and synthesized by campus oligo synthesis facility (UC-Davis, CA). Primer set for PKS were: PKSFP (5′-GCSTCCCGSGACCTGGGCITCGACTC-3′, SEQ ID NO:219) and PKSRP (5′-AGSGASGASGAGCAGGCGGTSTCSAC-3′, SEQ ID NO:220); primer set for PTS were: PTSFP (5′-ATCTACACSTCSGGCACSACSGGCAAGCCSAAGGG-3′, SEQ ID NO:221 1) and PTSRP (5′-AWTGAGKSICCICCSRRGIMGAAGAA-3′, SEQ ID NO:222). These primers were obtained from lab stock.

PCR Amplification

All PCR reactions were performed on a GeneAmp 2400 (Perlin Elmer, Calif.) thermocycler with touchldown programs. A typical PCR reaction (50 μl volume) contains of 1×PCR buffer with 1.5 mM MgCl₂, 5 ng of template, 7.5% DMSO, 100 nM dNTPs, 25 pmol each primer, 2.5 units of Tag DNA polymerase (Boehringer Mannheim Biochemicals, IN). For cyclase-domain amplification, the program was: pre-run denaturation (4 min at 94° C.)→10 cycles of ramp amplification (45 sec denaturation at 94° C.→1 min annealing starting from 65° C. to 55° C. in 10 cycles→1. 5 min extension at 72° C.)→25 cycles of amplification with a constant annealing temperature at 55° C.→post-run extension (10 min at 72° C.). For oxidase-domain amplification, the PCR program was the same except that the ramp temperature was from 60° C. to 50° C. and the extension time was 45 sec. For PKS PCR, the extension time was 1 min instead. For PTS PCR, the program was identical to that for cyclase-domain amplification. For amplification of a large fragment containing both cyclase-domain and oxidase-domain at each end, CyFP and OxRP were used as PCR primer set. The ramp temperature was from 65° C. to 55° C., and the extension time was 5 min.

Subcloning and Sequencing

PCR mixtures were separated in agarose gels. Interested fragments were recovered by Gel Extraction kit (Qiagen, Calif.). PCR fragments were subcloned into pGEM-T Easy vector (Promega, Wis.). Other restriction enzyme-generated DNA fragments were subcloned into the appropriate sites of pBSSK vector (Strategene), or pSP72, or pGEM-series (Promega). Recombinants plasmid DNAs were prepared by QiaPrep Spin Miniprep kit (Qiagen). DNA sequencing was performed by Davis DNA Sequencing. Inc. (Davis, Calif.). DNA sequences were analyzed with GCG package (GCG Inc. WI) and blasted against NCBI databases.

Library Screening.

PCR-amplified 1.1-kb cyclase-domain fragment was labelled with digoxigenin (BMB) and used as probe to screen the cosmid library following the standard colony-hybridization procedure (Sambrook et al., supra). Positive clones were isolated and their cosmid DNAs were prepared by alkali lysis method (Id). Subclonings from the left end of Cosmid 11 and the right end of Cosmid 1 (pYC9-2. 1 and pYC25, respectively) were used as probes to perform cosmid walkings.

Cosmid Clone Mapping and Shotgyun Subcloning.

Cosmid DNA was digested with BamHI, NcoI or Aatll and separated in 0. 8% agarose gels. The patterns of fragments were analyzed to generate a contig map. Southern hybridizations were performed to confirm the validation of the initial screening and the mapping. All 18 BamHI fragments (except the liberated 8 kb cosmid vector) from two overlapping cosmids (Cosmid 1 and 11) were subdloned into the BamHI site of pBSSK) by shotgun cloning method.

Fermentation, Purification and HPLC Analysis of Leinamycin

Fermentation method was adopted from Hara et al. (1989) J. Antibiot. 42: 1768-1774. Fifty ml of seed culture was allowed to grow for 48 hrs at 22° C. with adequate shaking and used 10 ml to inoculate 2×50 ml of fermentation broth. Culture was allowed to ferment for 48 hrs at 22° C. Moisturized Diaion HP-20 resin (SUPELCO, PA) was added to 5% (W/V) at 18 hrs after inoculation. Fermented broth was pooled and the pH was adjusted to pH 2.0 with H₂SO₄. Resin was recovered from the broth by filtration through two layers of cheesecloth. Crude Lnm preparation was eluted from the resin with 10 vol of methanol and solvent was concentrated by vacuum evaporation. HPLC separation of leinamycin from contaminants was performed with a Microsorb MV C1 8 reverse-phase column (Varian, Calif.) on a Dynamax SD-200 HPLC system (Rainin, Calif.). Authentic leinamycin (10 mg/ml) was provided by Tokyo Research Laboratories (Kyowa Hakko Kogyo Co., Japan).

PEG-Mediated Protoplast Transformation of S. atroolivaceus.

A 1.1-kb PCR amplified Cy-domain fragment was subcloned into: (1) pOJ260/EcoRI site to make the construct pYC12. (2) pKC1139/EcoRI site to make the second construct pYC20; DNA of pYC12 and pYC20 was prepared from the non-methylated host strain ET 12567, denatured with alkali solution and used for PEG-mediated protoplast transformation (au and Shen (2000) Antimicrob Agents Chemother. 44: 382-392).

Conjugation Between S. atroolivaceus and E. coli S17-1

Experiments of conjugation between S. atroolivaceus and E. coli S17-1 were performed by previous described procedure (Liu and Shen (2000) Antimicrob Agents Chemothier. 44: 382-392) with modifications. Three exconjugates (named YC12C1, YC12C2 and YC12C3) obtained with pYC12 were further purified, and the genomic DNA was prepared for Southern analysis.

Results and Discussion

S. afroolivaceus Cosmid Library Construction

High molecular-weight genomic DNA was isolated from S. atroolivaceus by a modified procedure (Rao et al. (1987) Methods in Enzymol. 153: 166-198; Kieser et al. (2000) Practical Streptomyces genetics. The John Innes Foundation, Norwich, England). The major modification was to extend the lysis time to up to 60 min. The cell wall of S. atroolivaceus seemed to be quite resistant to lysozyme treatment. Initial attempts that used a 30-min lysis time only obtained DNA samples with an average of ˜40 kb in size. Partial digestion by MboI was done with decreasing amount enzyme. A portion of the digested samples were analyzed on agarose gel to obtain the correct size of genomic DNA fragment for library construction.

The S. atroolivaceus primary cosmid library contains about 1.675×10⁵ colony-formatting-unit (cfu) in 500 μl SM buffer (335 cfu/μl), which is equivalent to 50 fold of the genome size. Analyses by alkali mini-preparation and BamHI-1igestion of 16 randomly selected cosmid clones indicated that the average size of inserts is about 40 kb. The library was amplified once in XL1-Blue cells. Sterile glycerol was added to the cell culture to 20% final concentration. Aliquots (10 ml each) of the amplified library were stored at −80° C. The titer of this amplified library was about 10⁵ cfu/μl.

PCR Amplification of a Cyclase(Cv)-Domain.

An expected 1.1 kb fragment was amplified with the Cyclase-domain PCR primer set (CyFP+CyRP). Although the control reactions with single primer also yield amplified bands, the 1.1-kb fragment seems to be a unique band to the reaction with the presence of both primers. This fragment was cloned into pGEM-T Easy vector (Promega) as pYC1. Both ends of pYC1 were sequenced. Blast results indicated its homology to the cyclase-domains of known NRPSs.

Mapping the Lnm Cluster, Subcloning and Partial Sequencing

Digoxigenin labelled 1.1-kb Cy-domain probe was used to screen about 4×1000 cfu of cosmid library, 15 well-isolated positive clones were obtained. Cosmid DNA from 12 clones was prepared and subjected to enzyme (BamHI) digestion, Southern hybridization and PCR diagnosis for the presence of Cy-domain, Ox-domain, PKS-domain or/and PTS-domain.

Initially, four cosmids (#1, #2, #6 and #11) were found to cover the longest chromosome region (total of 63 kb) and the PCR diagnosis data supports the predicted domain-organization. Later, the other ends of cosmid #1 and #11 were subcloned as pYC25 and pYC9-2.1, respectively, and were used as probes to obtain more cosniid clones extending to both directions. The relative position of those cosmids has been mapped and is illustrated in FIG. 2. Subsequently, the sequence of the entire lnm gene cluster and its flanking regions (˜135 kb ) was determined and analyzed (see FIG. 3). Deduced functions of the open reading frames (ORFs) in the leinamycin biosynthetic gene cluster are summarized herein in Table 1 and Table 2.

The Development of a Genetic System for S. atroolivaceus

S. atroolivaceus grows very well at 30° C.; it doesn't grow at temperature beyond 35° C. It is found to be highly sensitive to both apromycin (Am) and thiostrepton (Thio) (complete inhibition of growth at 10 μg/ml and 3 μg/ml, respectively). Thus, E. coli-Streptornyces shuttle vectors pOJ260 (suicide vector, Am^(R)) and pKC1139 (self-replicating vector, Am^(R)) were chosen to make the gene-disruption constructs pYC12 and pYC20, respectively. The concentration of antibiotic (Am) for selection of transformants/exconjugates is 50 μg/ml.

TSB medium gives the best yield for protoplast preparation, however the regeneration ratio is very low (less than 0.01%). Protoplasts made from TSB with 30% sucrose has a regeneration ratio of 0.2%. YEME gave the highest protoplast regeneration ratio of 0.6%. All the protoplast samples were generated on R2YE plates with 10 mM MgCl₂ supplemented. Approximately 1×10⁹ protoplasts and 2 μg DNA were used for each transformation experiment. Trial experiments were performed with pYC20 construct. Seven apparent positive transformants were obtained from the initial selection plates. Only 1 remained well-growing after 2 rounds of re-inoculation on vegetative-growth medium (TSB) and spore-generation medium (ISP-4). The calculated transformation efficiency was about 8×10⁻⁸.

Since the combined efficiency of transformation and regeneration was only about 1×10⁻⁹, which is very low, a conjugation approach was pursued, with surprising success. Spores (1×10⁹) were heat-shocked for 20 min at 42° C. instead of 10 min at 50° C., followed by incubation at 30° C. for up to 6 hours. The germination of spores was monitored by microscopic checks every 30 min from 4 hours after heat-shock E. coli S17-1 (bearing either pYC12 or pYC20) culture was freshly prepared. Conjugation was conducted on modified ISP-4 medium. After incubation at 28° C. for 5 days, about 18 and 500 apparent positive ex-conjugates grew out from the initial selection plates with pYC12 and pYC20, respectively. Therefore, the calculated conjugation/integration efficiency for non-self-replicating construct pYC12 was approximately 1.8×10⁻⁸, and the conjugation efficiency for self-replicating construct pYC20 was 5×10⁻⁷. Three exconjugates (named YC12C1, YC12C2 and YC12C3) obtained with pYC12 were further purified and analyzed.

Genomic DNA hybridization confirmed the correct disruption of the target NRPS-gene in S. atroolivaceus. With the Cy-domain as probe, wild type gave a hybridized band of 4.6 kb in size with NcoI digestion, while all three independent exconjugates YC12C1 to C3, appeared to be identical) gave a band of 9.2 kb, which is the expected size after vector single-rossover event. With the vector (pOJ260) DNA as probe, wild type DNA was not hybridized, while three exconjugates had the same 9.2 kb positive band.

Analysis of Lnm Production in S. ahoolivaceus Wild Type and Mutants

Crude Lnm sample was extracted from 50 ml of 4-day 2-step fermentation broth, following the described procedures (Hara et al. (1989) J. Antibiot. 42: 1768-1774). Lnm production was further analyzed by HPLC (FIG. 10). Samples purified from HPLC were also subjected to Mass Spectrometry analysis. Lnm has the molecular formula of C₂₂H₂₆N₂O₆S₃ and a molecular weight of 511.2 (Hara et al. (1989) J. Anztibiot. 42: 1768-1774). The dominant MS peak was found in wild type Lnm sample but not in YC12C1. This confirmed the HPLC results that the missing HPLC peak is indeed that of Lnm (FIG. 10).

In summary, a 180 kb gene cluster encoding lnm biosynthesis wascloned from S. atroolivaceus, 135 kb of which was sequenced. Sequence analysis revealed that Lnm is biosynthesized by a hybrid nonribosomal peptide synthetase and polyketide synthase system. These genes can now be used to improve the production of leinamycin, to engineer microbial strains for the production of novel leinamycin analogs as drug leads, and to use as genetic materials in combination with other nonribosomal peptide synthetase genes, polyketide synthase genes, and genes encoding other enzymes responsible for natural product biosynthesis in combinatorial biosynthesis to generate chemical structural diversity.

An efficient genetic system for in vivo manipulation of natural product biosynthesis in S. atroolivaceus has been developed. Conditions for the introduction of plasmid DNA into S. atroolivaceus have been optimized for both protoplast-mediated transformation and E. coli-S. atroolivaceus conjugation. Genetic engineering of leinamycin biosynthesis in S. atroolivaceus has been demonstrated by disrupting the NRPS module, resulting to the isolation of Lnm no-producing S. atroolivaceus mutants. The latter not only confirmed the cloned gene cluster encoding Lnm biosynthesis but also demonstrated the feasibility to making novel Lnm analogs in S. atroolivaceus by manipulating genes governing Lnm biosynthesis.

It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes. 

1. An isolated nucleic acid comprising a nucleic acid selected from the group consisting of a nucleic acid encoding one or more leinamycin (inn) open reading frames (ORFs) identified in Tables 1 and 2 (ORFs −35 through −1, lnmA through lnmZ and +1 through +9); a nucleic acid encoding a polypeptide encoded by any one or more of leinamycin (lnm) open reading frames (ORFs) identified in Tables 1 and 2 (ORFs −35 through −1, lnmA throug lnmZ and +1 through +9); a nucleic acid comprising the nucleotide sequence of a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs identified in Table 2 and the nucleic acid of a leinamycin-producing organism as a template; and a nucleic acid that encodes a protein comprising at least one catalytic domain selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an NADH dehydrogenase domain, a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain, and that specifically hybridizes to one or more leinamycin (lnm) open reading frames (ORFs) identified in Tables 1 and 2 (ORFs −35 through −1, lnmA through lnmZ, and +1 through +9) under stringent conditions.
 2. The isolated nucleic acid of claim 1, wherein said nucleic acid comprises a nucleic acid encoding at least two open reading frames identified in Tables 1 and 2, said open reading fraes being selected from the group consisting of −35, −34, −33, −32, −31, −30, −29, −28, −27, −26, −25, −24, −23, −22, −21, −20, −19, −18, −17, −16, −15, −14, −13, −12, −11, −10, −9, −8. −7, −7, −5, −4, −3, −2, −1, lnmA, lnmB, lnmC, lnmD, lnmE, lnmF, lnmG, lmnH, lnmI, lnmJ, lmnK, lnmL, lnmM, lnmN, lmnO, InmP, lnmQ, lmnR, lnmS, lnmT, lnmU, lnmV, lnmW, lnmX, lnmY, lnmZ, =1, +2, +3, +4, +5, +6, +7, +8, and +9.
 3. The isolated nucleic acid of claim 1, wherein said nucleic acid comprises a nucleic acid encoding at least three open reading frames identified in Tables 1 and 2, said open reading frames being selected from the group consisting of −35, −34, −33, −32, −31, −30, −29, −28, −27, −26, −25, −24, −23, −22, −21, −20, −19, −18, −17, −16, −15, −14, −13, −12, −11, −10, −9, −8. −7, −7, −5, −4, −3, −2, −1, lnmA, lnmB, lnmC, lnmD, lnmE, lnmF, lnmG, lnmH, lnmI, lnmJ, lnmK, lnmL, lnmM, lnmN, lnmO, lnmP, lnmQ, lnmR, lnmS, lnmT, lnmU, lnmV, lnmW, lnmX, lnmY, lnmZ, =1, +2, +3, +4, +5, +6, +7, +8, and +9.
 4. The isolated nucleic acid of claim 1, wherein said nucleic acid encodes a module.
 5. An isolated nucleic acid of claim 4, comprising a nucleic acid encoding a module comprising two or more catalytic domains of a protein encoded by a nucleic acid of a leinamycin (lnm) gene cluster wherein said catalytic domains are selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an enoyl reductase domain, a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain.
 6. The isolated nucleic acid of claim 1, wherein said nucleic acid comprises an open reading frame from SEQ ID NO: 1 or the complement of SEQ ID NO:1.
 7. The isolated nucleic acid of claim 1, wherein said nucleic acid has the nucleotide sequence of a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs identified in Table 2 and the nucleic acid of a leinamycin-producing organism as a template
 8. An isolated nucleic acid comprising a leinamycin (lnm) open reading frame (ORF) or an allelic variant thereof.
 9. The nucleic acid of claim 8, wherein said nucleic acid comprises a nucleic acid that is a single nucleotide polymorphism (SNP) of a leinamycin (lnm) open reading frame (ORF).
 10. An isolated gene cluster comprising open reading frames encoding polypeptides sufficient to direct the assembly of a leinamycin.
 11. An isolated multi-functional protein complex comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS), wherein said polyketide synthase (PKS) or said peptide synthetase (NRPS) have the amino acid sequence of a PKS or an NRPS found encoded by a nucleic acid from the leinamycin gene cluster.
 12. An isolated nucleic acid encoding a multi-functional protein complex comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS), wherein said polyketide synthase or said peptide synthetase, in its native state, is present in a leinamycin (lnm) gene cluster.
 13. An isolated polypeptide comprising a poly eptide selected from the group consisting of: a catalytic domain encoded by one or more leinamycin (lnm) open reading frames (ORPs) identified in Tables 1 and 2 (ORFs −35 through −1, lnmA through lnmZ and +1 through +9); a catalytic domain encoded by a nucleic acid having the sequence of a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs identified in Table 2; and a module comprising two or more catalytic domains of a protein encoded by a nucleic acid of a leinamycin gene cluster.
 14. The polypeptide of claim 13, wherein said polypeptide comprises an enzymatic domain selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)like domain, an oxidization domain (Ox), a enoyl reductase domain, a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain.
 15. The polypeptide claim 13, wherein the nucleic acid of a leinamycin gene cluster comprises a nucleic acid encoding at least two open reading frames identified in Tables 1 and 2, said open reading frames being selected from the group consisting of −35, −34, −33, −32, −31, −30, −29, −28, −27, −26, −25, −24, −23, −22, −21, −20, −19, −18, −17, −16, −15, −14, −13, −12, −11, −10, −9, −8. −7, −7, −5, −4, −3, −2, −1, lnmA, lnmB, lnmC, lnmD, lnmE, lnmF, lnmG, lnmH, lnmI, lnmJ, lnmK, lnmL, lnmM, lnmN, lnmO, lnmP, lnmQ, lnmR, lnmS, lnmT, lnmU, lnmV, lnmW, lnmX, lnmY, lnmZ, =1, +2, +3, +4, +5, +6, +7, +8, and +9.
 16. The polypeptide claim 13, wherein said nucleic acid of a leinamycin gene cluster comprises a nucleic acid encoding at least three open reading frames identified in Tables 1 and 2, said open reading frames being selected from the group consisting of −35, −34, −33, −32, −31, −30, −29, −28, −27, −26, −25, −24, −23, −22, −21, −20, −19, −18, −17, −16, −15, −14, −13, −12, −11, −10, −9, −8. −7, −7, −5, −4, −3, −2, −1, lnmA, lnmB, lnmC, lnmD, lnmE, lnmF, lnmG, lnmH, lnmI, lnmJ, lnmK, lnmL, lnmM, lnmN, lnmO, lnmP, lnmQ, lnmR, lnmS, lnmT, lnmU, lnmV, lnmW, lnmX, lnmY, lnmZ, =1, +2, +3, +4, +5, +6, +7, +8, and +9.
 17. The polypeptide of claim 13, wherein said polypeptide comprises a module comprising two or more catalytic domains of a protein encoded by a nucleic acid of a leinamycin gene cluster wherein said catalytic domains are selected from the group consisting of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), an enoyl reductase domain (ER), a methyltransferase domain, a phosphotransferase domain, a peptide synthetase domain, and an aminotransferase domain.
 18. An isolated polypeptide comprising a module wherein said module is specifically bound by an antibody that specifically binds to a leinamycin (lnm) module.
 19. The polypeptide of claim 18, wherein said polypeptide is specifically bound by an antibody that specifically binds to a polypeptide encoded by a leinamycin open reading frame.
 20. An expression vector comprising a nucleic acid of any one of claims 1 through
 12. 21. A host cell transformed with an expression vector of claim
 20. 22. The host cell of claim 21, wherein said cell is transformed with an exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the assembly of a leinamycin or leinamycin analog.
 23. The cell of claim 21, wherein said cell is a bacterial cell.
 24. The cell of claim 23, wherein said cell is a Streptomyces cell.
 25. The cell of claim 21, wherein said cell is a eukaryotic cell.
 26. The cell of claim 21, wherein said cell is an insect cell.
 27. A method of chemically modifying a molecule, said method comprising contacting a molecule that is a substrate for a polypeptide encoded by one or more leinamycin biosynthesis gene cluster open reading frames with a polypeptide encoded by one or more leinamycin biosynthesis bene cluster open reading frames, whereby said polypeptide chemically modifies said molecule.
 28. The method of claim 27, wherein said method comprising contacting said molecule with at least two different polypeptides encoded by leinamycin (lnm) gene cluster open reading frames.
 29. The method of claim 27, wherein said method comprising contacting said molecule with at least three different polypeptides encoded by leinamycin (lnm) gene cluster open reading frames.
 30. The method of claim 27, wherein said contacting is in a host cell.
 31. The method of claim 30, wherein said host cell is a bacterium.
 32. The method of claim 27, wherein said contacting ex vivo.
 33. The method of claim 27, wherein said molecule is an endogenous metabolite produced by said host cell.
 34. The method of claim 27, wherein said molecule is an exogenous supplied metabolite.
 35. The method of claim 27, wherein said host cell is a eukaryotic cell.
 36. The method of claim 35, wherein said eukaryotic cell is selected from the group consisting of a mammnalian cell, a yeast cell, a plant cell, a fungal cell, and an insect cell.
 37. The method of claim 27, wherein said molecule is an amino acid and said polypeptide is a peptide synthetase.
 38. The method of claim 27, wherein said polypeptide is an amino transferase.
 39. A cell that overexpresses leinamycin.
 40. The cell of claim 39, wherein said cell overexpresses a polypeptide encoded by leinamycin open reading frame lnmL.
 41. A cell that produces leinamycin, wherein one or more proteins that synthesize said leinamycin are encoded by one or more heterologous nucleic acids.
 42. The cell of claim 41, wherein said heterologous nucleic acids comprise at least three open reading frames identified in Tables 1 and 2, said open reading frames being selected from the group consisting of −35, −34, −33, −32, −31, −30, −29, −28, −27, −26, −25, −24, −23, −22, −21, −20, −19, −18, −17, −16, −15, −14, −13, −12, −11, −10, −9, −8, −7, −7, −5, −4, −3, −2, −1, lnmA, lnmB, lnmC, lnmD, lnmE, lnmF, lnmG, lnmH, lnmI, lnmJ, lnmK, lnmL, lnmM, lnmN, lnmO, lnmP, lnmQ, lnmR, lnmS, lnmT, lnmU, lnmV, lnmW, lnmX, lnmY, lnmZ, +1, +2, +3, +4, +5, +6, +7, +8, and +9
 43. A method of coupling a first amino acid to a second amino acid, said method comprising contacting the first and second amino acid with a recombinantly expressed leinamycin nonribosomal peptide synthetase (NRPS).
 44. The method of claim 49, wherein said NRPS is selected from the group consisting of NRPS-1, and NRPS-2.
 45. The method of claim 49, wherein said contacting is in a host cell.
 46. A method of coupling a first fatty acid to a second fatty acid, said method comprising contacting the first and second fatty acids with a recombinantly expressed leinamycin polyketide synthase (PKS).
 47. The method of claim 46, wherein said PKS is selected from the group consisting of lnm PKS-1, PKS-2, PKS-3, PKS-4, PKS-5, and PKS-6.
 48. The method of claim 46, said contacting is in a host cell.
 49. A method of producing a leinamycin or leinamycin analog, said method comprising: providing a cell transformed with an exogenous nucleic acid comprising a leinamycin gene cluster encoding polypeptides sufficient to direct the assembly of said leinamycin or leinamycin analog; culturing the cell under conditions permitting the biosynthesis of leinamycin or leinamycin analog; and isolating said leinamycin or leinamycin analog from said cell.
 50. A method of producing a leinamycin analog, said method comprising: providing a cell comprising a leinamycin gene cluster; transfecting the cell with a nucleic acid that alters the leinamycin gene cluster through homologous recombination so that the gene cluster encodes a biosynthetic pathway that synthesizes,said leinamycin analog; culturing the cell under conditions permitting the biosynthesis of the leinamycin analog; and isolating the leinamycin analog from the cell.
 51. An isolated nucleic acid comprising a nucleic acid encoding a phosphopantetheinyl transferase said nucleic acid encoding a phosphopantetheinyl transferase being selected from the group consisting of: a nucleic acid encoding the protein comprising the amino acid sequence encoded by lmp of FIG. 5; a nucleic acid encoding a polypeptide having phosphopantetheinyl transferase activity where said nucleic acid specifically hybridizes to the nucleic acid having the sequence encoded by lmp of FIG. 5 under stringent conditions.
 52. The nucleic acid of claim 51, wherein said nucleic acid comprises the sequence of lmp in FIG.
 5. 53. The nucleic acid of claim 51, wherein said nucleic acid comprises a vector.
 54. A polypeptide comprising a phosphopantetheinyl transferase encoded by a nucleic acid of claim
 51. 55. A vector comprising the nucleic acid of claim
 51. 56. A cell transfected with the vector of claim
 55. 57. A method of converting an apo-carrier protein to a holo-carrier protein comprising reacting said apo-carrier protein with a recombinant phosphopantetheinyl transferase encoded by the nucleic acid of claim 51 and coenzyme A thereby producing a holo-carrier protein.
 58. A cell comprising a modified leinamycin gene cluster nucleic acid, said cell producing elevated amounts of leinamycin as compared to the wild type cell.
 59. The cell of claim 58, wherein said cell overexpresses a resistance gene from the leinamycin gene cluster.
 60. The cell of claim 59, wherein said resistance gene is a gene listed in Table
 2. 61. An antibody that specifically binds to a polypeptide encoded by an lnm open reading frame identified in Table
 2. 62. The antibody of claim 61, wherein said antibody is a single chain antibody. 